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Abstract. We present here Wave the first “hash-and-sign” code-based signature scheme 
which strictly follows the GPV strategy [GPV08]. It uses the family of ternary generalized 
( U , U + V) codes. We prove that Wave achieves existential unforgeability under adaptive 
chosen message attacks (EUF-CMA) in the random oracle model (ROM) with a tight re¬ 
duction to two assumptions from coding theory: one is a distinguishing problem that is 
related to the trapdoor we insert in our scheme, the other one is DOOM, a multiple tar¬ 
get version of syndrome decoding. The algorithm produces uniformly distributed signatures 
through a suitable rejection sampling. Our scheme enjoys efficient signature and verification 
algorithms. For 128 bits of classical security, signature are 8 thousand bits long and the 
public key size is slightly smaller than one megabyte. Furthermore, with our current choice 
of parameters, the rejection rate is limited to one rejection every 3 or 4 signatures. 


1 Introduction 

Code-Based Signature Schemes. It is a long standing open problem to build an efficient 
and secure digital signature scheme based on the hardness of decoding a linear code which could 
compete with widespread schemes like DSA or RSA. Those signature schemes are well known to 
be broken by quantum computers and code-based schemes could indeed provide a valid quantum 
resistant replacement. A first answer to this question was given by the CFS scheme proposed in 
[CFS01]. It consisted in signing with the Niederreiter public-key decryption primitive [Nie86]. This 
requires a linear code for which there exists an efficient decoding algorithm, able to find the closest 
codeword for a non-negligible proportion of all words. This means that if H is an r x n parity-check 
matrix of the code, there exists for a non-negligible proportion of all s in an efficient procedure 
to decode, that is find a word e in F(( of smallest Hamming weight such that 

eH T = s. (1) 

In such a case we say that s, which is generally called a syndrome in the literature, can be decoded. 
[CFS01] achieved this task by using high rate Goppa codes. This signature scheme followed a 
relaxed form of the “hash-and-sign” paradigm. To sign a message m, a hash function h is used to 
produce a sequence s 0 ,..., s^ of elements of F£. For instance s 0 = h( m) and s, = h{ s 0 , i ) for i > 0. 
The first s t that can be decoded defines the signature of m as the word e of smallest Hamming 
weight such that eH T = s,. This signature scheme has however two drawbacks: (i) for high rates 
Goppa codes the indistinguishability assumption used in its security proof has been invalidated in 
[FGO+11], (ii) it scales poorly with respect to security. Indeed, a crude extrapolation of parallel 
CFS [FinlO] and its implementations [LS12, BCS13] yields for 128 bits of classical security a public 
key size of several gigabytes and a signature time of several seconds. Those figures even grow to 
terabytes and hours for quantum-safe security levels, making the scheme unpractical. 


Other Code-Based Signature Schemes. Instead of trying to solve a conventional decoding 
problem, meaning that we want to find an error of minimum weight satisfying (1), it is enough to 

* This work was supported by the ANR CBCRYPT project, grant ANR-17-CE39-0007 of the French 
Agence Nationale de la Recherche. 
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require in this cryptographic context to solve (1) for an error e whose weight w is not necessarily 
minimal but just sufficiently low, so that problem (1) stays hard for someone whose does not 
know the secret structure of the code that is used. This approach has been followed in [BBC+13] 
with LDGM codes, in [GSJB14] with (essentially) convolutional codes and in the NIST proposal 
pqsigRM [LKLN17] with modified Reed-Muller codes. The LDGM scheme was broken in [PT16], 
[GSJB14] has been broken in [MP16] (and there are still some doubts that there is a way to 
choose the parameters of the scheme [GSJB14] in order to avoid the attack [LT13] on the McEliece 
cryptosystem based on convolutional codes [LJ12]). 

Other signature schemes based on codes were also given in the literature such as for instance 
the KKS scheme [KKS97, KKS05] or its variants [BMS11, GS12], But they can be considered 
at best to be one-time signature schemes in the light of the attack given in [COV07] and great 
care has to be taken to choose the parameters of these schemes as shown by [OT11] which broke 
all the parameters proposed in [KKS97, KKS05, BMS11]. There was also the proposal RaCoSS 
to the NIST that was based on a public matrix whose columns are formed by syndromes of low 
weight errors. It was broken in [HBPL18]. Another possibility is to use the Fiat-Shamir heuristic 
to turn a zero-knowledge authentication scheme into a signature scheme. When based on the Stern 
code-based authentication scheme [Ste93b] this leads however to a signature scheme with really 
large signature sizes (of the order of hundred(s) of kilobits). This represents a complete picture of 
code-based signature schemes based on the Hamming metric. 

There has been some recent progress in this area for another metric, namely the rank metric 
[GRSZ14] with the RankSign scheme. This scheme enjoys remarkably small key sizes, it is of order 
tens of thousands bits for 128 bits of security. Unfortunately it got broken in [DT18]. In summary, 
it is still a very challenging and open question to come up with an efficient and secure signature 
scheme based on error-correcting codes. 


Our Contribution: a “Hash-and-Sign” Signature Scheme Based on the GPV 
Approach. 

Our scheme is based on the hash-and-sign approach and the GPV strategy [GPV08] to devise 
such signature schemes. Recall that the notions put forward in that paper allowed to build the 
first identity based encryption scheme based on hard problems on lattices. This strategy has 
also been adopted in Falcon [FHK + ], a lattice based signature submission to the NIST call for 
post-quantum cryptographic primitives. It is based on the notion of preimage sampleable function. 
Roughly speaking, this is a family of trapdoor one-way functions ( f a ) a such that with overwhelming 
probability over the choice of the function f a (i) the distribution of the images f a {x) is very close 
to the uniform distribution over the set of possible outputs (ii) the distribution of the output of 
the algorithm inverting f a using the trapdoor is very close to the uniform distribution over the 
inputs to f a . In [GPV08] such functions are based on a’s that are matrices over Z™ xm , whereas the 
input to f a is a subset of elements e £ Z m that are of euclidean norm bounded by some quantity 
W and 

/A.(e) = eA T mod q. 

Our preimage sampleable function is of the same kind, i.e. 

/ H (e)=e H T . 

with the only difference that we perform matrix multiplication in the finite field F g and the inputs 
e will be restricted to have Hamming weight exactly w, where w is chosen such that it is hard to 
solve (1) for e’s of such a Hamming weight. 

In [GPV08] a signature scheme based on preimage sampleable functions is given that is shown 
to be strongly existentially unforgeable under a chosen-message attack if in addition the preimage 
sampleable functions are also collision resistant. With our choice of w and F g , our preimage sam¬ 
pleable functions are not collision resistant. However, as observed in [GPV08], collision resistance 
allows a tight security reduction but is not necessary : a security proof could also be given when 
the function is “only” preimage sampleable. We will also get a tight security reduction in our case 
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by choosing carefully the difficult problems we use in the security : one is a distinguishing problem 
that is related to the trapdoor we insert in our scheme, the other one is a “multiple instances- 
only one solution required” version of the decoding problem (1). This is the so called “Decoding 
One Out of Many” problem (DOOM in short) [Senll]. 


Our Trapdoor: Generalized (U, U + V) Codes. In [GPY08] the trapdoor consists in a short 
basis of the lattice considered in the construction. Our trapdoor will be of a different nature, it 
consists in choosing parity-check matrices of generalized ( U , U + V) codes. Not every parity-check 
matrix is a parity-check matrix of a generalized (U,U + V) code, however there are really plenty 
of such codes. The U and V codes can namely be chosen at random in this construction and the 
number of such codes of dimension k and length n is of order q & i n ) when k = 0 (n). A generalized 
(U, U + V) code of length n over F 9 has 6 ingredients 

— Two codes U and V of length n/2 

— Four n/2 x n/2 diagonal matrices Di,..., D 4 over F„ such that D = ( J ? 1 J ? 2 ) is invertible. 

\D 3 JJ 4 y 

The generalized ( U , U + V) code, which we denote by (C/Di + FD 2 , t/D 3 + FD 4 ) is defined by 
(C/Dx + VX> 2 , [7D 3 + VD 4 ) ={(uDx + vD 2 , uD 3 + vD 4 ) : u e U, v € V}. 


Standard (U,U + V) codes correspond to Di = D 3 = D 4 = l „/ 2 and D 2 = 0„/ 2 , where l „/ 2 
stands for the identity matrix of size n /2 and 0„/ 2 is the n /2 x n /2 zero matrix. 

It is not the first time that (U, U + V) codes or generalized ( U , U + V) codes are suggested 
for a cryptographic use. (U, U + V) codes were already considered for constructing a McEliece 
cryptosystem in [KKS05, p.225-228] and generalized ( U , U + V) codes in [PMIB17]. However both 
papers did not consider the improvement in the error correction performance that comes with 
the ( U , U + Fj-construction (generalized or not) if a decoder that uses soft information is used. 
This was first observed in the very same cryptographic context in [MCT16]. Having codes with a 
better error correction in this context results in being able to reduce the key sizes of the scheme. 
The (generalized) ( U , U + F)-construction also potentially allows to use in this context codes for 
U and V that would be insecure in this context if used alone, such as for instance generalized 
Reed-Solomon codes. This allows for instance to thwart the key attacks [SS96, CGG + 14] on the 
McEliece or Niederreiter scheme based on generalized Reed-Solomon code [Nie 86 ]. 

We push this idea further here, by allowing U and V to be completely random and by decoding 
them with a very simple decoder, namely a variation of the Prange decoder [Pra62] that is able to 
produce for any parity-check matrix H at will a solution of (1) when w is in the range [ n— ^]. 
Note that this algorithm works in polynomial time and that outside this range of weights the 
complexity of the best known algorithms is exponential in n for weights w of the form w = um 

where u is a constant that lies outside the interval ^ 1 />. I — |] where p= F In the case of a 
parity-check matrix of a generalized ( U , U + V) codes, a small tweak in the decoder is able to 
take advantage of the generalized ( U , U + V) structure to obtain smaller or larger weights for w 
outside this regime. This is in essence the trapdoor of our signature scheme. A further tweak in 
the decoder consisting in performing only a small amount of rejection sampling (with our choice 
of parameters one rejection every 3 or 4 signatures) allows to obtain solutions that are uniformly 
distributed over the words of weight w. Furthermore we also show that syndromes eH T associated 
to this kind of codes are statistically indistinguishable from random syndromes when errors e are 
drawn uniformly at random among the words of weight w. These are the two key properties for 
obtaining a function /h that is preimage sampleable in our signature scheme. Finally, a variation 
of the proof decoding technique of [GPV08] allows to give a tight security proof of our signature 
scheme that relies only on the hardness of two problems, namely 

Decoding Problem: Solving at least one instance of the decoding problem (1) out of multiple 

instances for a certain relative weight w/n that is outside the range [oQ sy , w+^y], where w^ sy = 

<?— 1 r , | + _ n—r , - 

q n anCl W easy " NT + W easy 
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Distinguishing Problem: Deciding whether a linear code is a permuted generalized (U, U + V) 
code or not. 

Interestingly, some recent work [CD17] has shown that these two properties (namely statistical in- 
distinguishability of the signatures and the syndromes associated to the code family chosen in the 
scheme) are also enough to obtain a tight security proof in the Quantum Random Oracle Model 
(QROM) for generic code-based signatures under the assumption that the Decoding Problem re¬ 
mains hard against a quantum computer and that the code family which is used is computationally 
indistinguishable from generic linear codes. In other words, this can be used to give a tight security 
proof of our generalized (U, U + V) codes in the QROM. 


The Hardness of the Decoding Problem. All code-based cryptography relies upon that 
problem. The problem of solving (1) for a q -ary r x n matrix H is well known to be hard when 
we seek a word e of relative weight w/n < 0 J easy = Beyond this point the problem is easily 

solved with linear algebra until we reach w^ sy = + oj~ asy = 1 — ^ ^, and the problem becomes 

hard again, a fact which is not as widely spread and which we will use in this work. 
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Fig. 1. Asymptotic Hardness of Decoding 


Furthermore, here we are in a case where the decoding problem has multiple solutions and 
the adversary may produce any number of instances of (1) with the same matrix H and various 
syndromes s and is interested in solving only one of them. This relates to the so called Decoding 
One Out of Many (DOOM) problem. This problem was first considered in [JJ02]. It was shown 
there how to adapt the known algorithms for decoding a linear code in order to solve this modified 
problem. This modification was later analysed in [Senll]. The parameters of the known algorithms 
for solving ( 1 ) can be easily adapted to this scenario where we have to decode simultaneously 
multiple instances which all have multiple solutions. 


The Hardness of the Distinguishing Problem. This problem might seem at hist sight to 
be ad-hoc. However, even in the very restricted case where the generalized (U, U + F)-code is 
just a (U, U + U)-code and when the permutation is restricted to leave globally stable the right 
and left part, detecting whether the resulting code is a permuted (U, U + U)-code is NP-complete 
problem (see [DST17b, §7.1, Thm. 4]). Therefore the Distinguishing Problem is also NP-complete 
for generalized (U, U + U)-code. This theorem is proved in the case of binary ({/, U + U)-codes in 
[DST17b, §7.1, Thm 3]). The proof given there carries over directly to an arbitrary finite held F q . 
However as observed in [DST17b, p. 3], these NP-completeness reductions hold in the particular 
case where the dimensions kjj and ky of the code U and V satisfy ku < ky. If we stick to the 
binary case, i.e. q = 2, then in order that our (U, U + V) decoder works outside the integer interval 
[|,n— |] it is necessary that ku > ky. Unfortunately in this case there is an efficient probabilistic 
algorithm solving the distinguishing problem that is based on the fact that in this case the hull of 
the permuted ([/, U + U)-code is typically of large dimension, namely ku — ky (see [DST17a, §1 
p.1-2]). This problem can not be settled in the binary case by considering generalized (U, U + V) 
codes instead of just plain (U, U + V)-codes, since it is only for the restricted class of (U, U + V) 
codes that the (U, U + V) decoder considered in [DST17a] is able to work properly outside the 
critical interval [|,n — |]. This is really related to the polarization phenomenon that lead to the 
famous construction of polar codes [Ari09]: in the binary case, there is only one 2x2 kernel that 
polarizes, namely the kernel that corresponds to (U, U + U)-codes. 
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This situation changes drastically when we move to larger finite fields. There are already several 
different 2 x 2-kernels that polarize over F 3 . In our cryptographic setting, this translates into the 
fact that it is not only for (U, U + Encodes that we can solve the problem (1) efficiently. This 
holds for a very large of choices of the Dj’s. The fact that there is a very large choice of matrices 
of Dj is clearly a very powerful phenomenon that makes the distinguishing problem much harder. 
In terms of simplicity of the decoding procedure used in the signing process, it seems that defining 
our codes over the finite field F 3 is particularly attractive. In such a case, there is a big gain for 
the generalized ( U , U + V) codes that we have chosen and their decoding algorithm to choose the 
signature weight w to be very large, i.e. significantly above the upper-limit n — | below which the 
decoding problem becomes polynomial as long as w is also above ^. We will discuss this situation 
in depth in §7. In this case, it seems that the best approach for solving the distinguishing problem 
is based on the following observation. This code has namely codewords of a weight slightly smaller 
than the minimum distance of a random code of the same length and dimension. It is very tempting 
to conjecture that the best algorithms for solving the Distinguishing Problem come from detecting 
such codewords. This approach can be easily thwarted by choosing the parameters of the scheme 
in such a way that the best algorithms for solving this task are of prohibitive complexity. Notice 
that the best algorithms that we have for detecting such codewords are in essence precisely the 
generic algorithms for solving the Decoding Problem. In some sense, it seems that we might rely 
on the very same problem, namely solving the Decoding Problem, even if our proof technique does 
not show this. 

All in all, we propose to instantiate our signature scheme in the finite field F 3 which gives 
the first practical signature scheme based on ternary codes which comes with a security proof and 
which scales well with the parameters: it can be shown that if one wants a security level of 2 A , then 
signature size is of order 0( A), public key size is of order 0{ A 2 ), signature generation is of order 
0(A 3 ), whereas signature verification is of order 0(A 2 ). It should be noted that contrarily to the 
current thread of research in code-based or lattice-based cryptography which consists in relying on 
structured codes or lattices based on ring structures in order to decrease the key-sizes we did not 
follow this approach here. This allows for instance to rely on the NP-complete Decoding Problem 
which is generally believed to be hard on average rather that on decoding in quasi-cyclic codes 
for instance whose status is still unclear with a constant number of circulant blocks. Despite the 
fact that we did not use the standard approach for reducing the key sizes relying on quasi-cyclic 
codes for instance, we obtain acceptable key sizes (less than one megabyte for 128 bits of security) 
which compare very favourably to unstructured lattice-based signature schemes such as TESLA 
for instance [ABB+17]. This is due in part to the tightness of our security reduction. 


Organization of the Paper. The paper is organized as follows, we present the outline of our 
scheme in §3 as well as the properties which are asked to reach the GPV strategy, namely the 
definition of one-way preimage sampleable functions. In §4 we give the trapdoor that we consider 
and in §5 we firstly show that it achieves the domain sampling with uniform output of preimage 
sampleable functions and then we explain how to produce uniformly distributed signatures with 
some rejection sampling. In §6 we prove it is secure under existential unforgeability under an 
adaptive chosen message attack (EUF-CMA) in the random oracle model (ROM), in relation with 
this proof we respectively examine in §7 and §8 the best messages and key attacks. Finally we give 
some set of parameters on par with the security reduction and with the current state-of-the-art 
for decoding techniques. 


2 Notation 

We provide here some notation that will be used throughout the paper. 

General Notation. The notation x = y means that x is defined to be equal to y. We denote by 
F ? the finite field with q elements and by S w , n the subset of F” of words of weight w (q will be 
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clear from the context), n will also be generally clear from the context. In such a case we just 
write S w . 


Vector Notation. Vectors will be written with bold letters (such as e) and uppercase bold 
letters are used to denote matrices (such as H). Vectors are in row notation. Let x and y be two 
vectors, we will write (x, y) to denote their concatenation. We also denote by x/ the vector whose 
coordinates are those of x = (£j)i<i<n which are indexed by /, i.e. 

x/ = (xi) ieI . 

Sometimes we denote for a vector x by x(?) its *-th entry, or for a matrix A, by A (i,j) its entry 
in row i and column j. 

We define the support of x = {xi)i<i< n as 

Supp(x) ={i € {1, • • • ,n} such that a :* ^ 0} 

The Hamming weight of x is denoted by |x|. By some abuse of notation, we will use the same 
notation to denote the size of a finite set: |5| stands for the size of the finite set S. It will be clear 
from the context whether |x| means the Hamming weight or the size of a finite set. Note that 

|x| = | Supp(x)|. 

By extension the support Supp(M) of a matrix M is the union of its rows supports. 

For a vector x in F” and a 6 F, we denote by |x| a its number of entries in it that are equal to 
a: 

|x| Q = |{i : Xi = a}|. 


Probabilistic Notation. Let S be a finite set, then x ^ S means that x is assigned to be 
a random element chosen uniformly at random in S. For a distribution T> we write £ ~ T> to 
indicate that the random variable £ is chosen according to T>. The uniform distribution on a 
certain discrete set is denoted by U. The set will be specified in the text. We denote the uniform 
distribution on S w by U w . When we have probability distributions V i, V 2 , ..., V n over discrete 
sets £-\, £ 2 , • • •, £„., we denote by V\ (g> V 2 < 8 > • • • ® V n the product probability distribution, i.e 
V 1 (g> • • • (g) V n {x \,..., x n ) = Ui(xi)... U n (x n ) for (x \,..., x n ) € £\ x • • • x £ n . The n-th power 


product of a distribution V is denoted by 



n times 

The statistical distance between two discrete probability distributions over a same space £ is 
defined as: 

p(Po,^i)^El P o(*)~2M*)l- 

x€:£ 


For two random variables X and Y ranging over the same space, we will also denote by p(X, Y) 
the statistical distance between the distribution T>x of X and the distribution V Y of Y, that is 


p(X,Y)=p(V X ,Vy). 

Recall that a function f(n) is said to be negligible if for all polynomials p(n), |/(n)| < _p(n ) _1 for 
all sufficiently large n that we will denote by / (E negl(n). 

Sometimes when we wish to emphasize on which probability space the probabilities or the 
expectations are taken, we denote by a subscript the random variable specifying the associated 
probability space over which the probabilities or expectations are taken. For instance the proba¬ 
bility P a -(£) of the event £ is taken over 17 the probability space over which the random variable 
X is defined, i.e. if X is for instance a real random variable, A is a function from a probability 
space 17 to M, and the aforementioned probability is taken according to the probability chosen for 
17. 
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Coding Theory. For any matrix M we denote (M) the vector space spanned by its rows. A 
g-ary linear code C of length n and dimension k is a subspace of F™ of dimension k and is often 
defined by a parity-check matrix H over F 9 of size r x n as 

C = <H) X = {x £ FJ : xH t = 0} . 

When H is of full rank (which is usually the case) we have r = n — k. A generator matrix of C is 
a k x n full rank matrix G over F 9 such that (G) = C. The code rate, usually denoted by R , is 
defined as the ratio k/n. 

An information set of a code C of length n is a set of k coordinate indices I C {1,... ,n} 
which indexes k independent columns on any generator matrix. Its complement indexes n — k 
independent columns on any parity check matrix. For any s £ F” -fe , H £ Fg" an d an y 

information set I of C = (H)- 1 , for all x £ F™ there exists a unique eeFJ such that eH T = s and 
xz = ex. 

We will also consider here the notion of punctured code. For a subset / C {1,... ,n} and a 
code C of length n, we denote by Punc/(C), the code C punctured in I. This is defined as the 
set {c/ = ( Cj)j£ii r ... in \\i : c £ C}, in other words the set of vectors obtained by deleting in the 
codewords of C the positions that belong to I. 

3 The Wave-Signature Scheme 
3.1 Outline of the Scheme 

We define a probabilistic full domain hash (FDH) signature scheme, as in [BR96, Cor02]. We 
replace RSA with a trapdoor function based upon the hardness of the Decoding Problem. Let 
C be a linear of dimension k and length n over F g defined by a parity-check matrix H of size 
(n — k) x n. The one way function /h we consider is given by 

/h : S w —> ¥™- k 
e i—> eH T 

Inverting this function on an input s amounts to solve the Decoding Problem. We are ready now 
to give the general scheme we consider. Let us assume that we have a family of codes which is 
defined by a set T of parity-check matrices of size (n — k) x n over F 9 such that for all H sk £ T 
we have an algorithm Z?H Bk which on input s computes e £ f^ 1 { s) (it will be the family of 
generalized admissible (U,U + V) codes which are defined in §4.2). Then we pick uniformly at 
random H sk £ T 1 an n x n permutation matrix P, a non-singular matrix S £ F^ 1 fe ^ x (« fc ) w hich 
define the secret and public key as: 

sk <- (H sk , P, S) ; pk 4— H pk where H pk = SH sk P 
Remark 1. Let C sk be the code defined by H sk , then H pk defines the following code: 


C pk — {cP : c € C sk }. 


We also select a cryptographic hash function Hash : {0,1}* —> F™ k and a parameter Ao for 
the random salt r. The algorithms Sgn sk and Vrfy pk are defined as follows 


Sgn sk (m): 
r <—■ {0,1} A ° 
s 4 — Hash(m, r) 
e^D Hak (s(S^) T ) 

return(eP, r) 


Vrfy pk (m, (e', r)): 
s Hash(m, r) 

if e'Hp k = s and |e'| = w return 1 
else return 0 


Remark 2. We add a salt in the scheme in order to have a tight security proof. 
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Proof (Correction of the verification step). The pair (eP,r) passes the verification step because 
by definition of D Hsk (s(S -1 ) T ) we have eHj k = s(S~ 1 ) T . Therefore (eP)Hp k = e(H pk P T ) T = 
e(H pk p- 1 ) T = e(SH sk ) T = eHj k S T = s(S~ 1 ) T S T = s. We also have |eP| = |e| = w. 

To summarize, a valid signature of a message m consists of a pair (e, r) such that eHj k = 
Hash(m, r) with e of Hamming weight w. 


3.2 One-way Preimage Sampleable Code-based Functions 

Classically, FDH signature schemes such as DSA are based on a trapdoor one-way function / : 
T> —> A which is a permutation. In this case the trapdoor permits to compute for any a £ A the 
unique d G D (the signature of a) which verifies /(d) = a. Therefore, in the random oracle model 
when a follows the uniform distribution, signatures d which are produced are uniform too. In this 
way, the nice property to be a permutation for the trapdoor one-way function offers a first level 
of security for the signature scheme. Nevertheless, for the trapdoor one way function /h which 
is used in our code-based scheme, this condition to be a permutation is too strong to be met in 
an interesting way. This situation does not only arise for code-based trapdoor FDH signatures, 
it appears in lattice-based cryptography too. In this context, authors of [GPV08] gave additional 
properties to be verified by the one way function. One of the crucial property that is asked for is 
that the algorithm that inverts fu based on the trapdoor is close to the uniform distribution on 
the inputs. In the following we will speak of GPV strategy. Authors of [GPV08] summarized this 
in the definition of preimage sampleable function (see [GPV08, Definition 5.3.1]). To simplify the 
definition, we will directly express this notion for a restricted (and simplified) class of code-based 
trapdoor function. 

Definition 1 (One-way preimage sampleable code-based functions). It is a pair of prob¬ 
abilistic polynomial-time algorithm (Trapdoor, InvertAlg) together with a triple of functions 
(n(X),k(X),w(X)) growing polynomially with the security parameter A and giving the length and 
dimension of the codes and the signature weight we consider, such that 

— Trapdoor when given A, outputs (H, T) where H is an (n — k) x n matrix over F 9 and T the 
trapdoor corresponding to H. Here and elsewhere we drop the dependence in A of the functions 
n, k and w. 

— InvertAlg is a probabilistic algorithm which takes as input T and an element s G F” -fc and 
outputs an e £ S w , n such that eH T = s. 

The following properties have to hold for all but a negligible fraction of H output by Trapdoor. 

1. Domain Sampling with uniform output: 

p(eH T , s) G negl(A) 

where e and s are two random variables, with e being unifoi'mly distributed over S WtU and s 
being uniformly distributed over F^ ,_fc . 

2. Preimage Sampling with trapdoor: for every s G F” _fe , we have 

p (Invert Alg(s,T),e) G negl(A), 
where e is uniformly distributed in S w<n . 

3. One wayness without trapdoor: for any probabilistic poly-time algorithm A outputting an el¬ 
ement e G S w>n when given H G F q l k ' ixn and s G F” k , the probability that eH T = s 
is negligible, where the probability is taken over the choice of H, the target value s chosen 
uniformly at random, and A’s random coins. 
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It will turn out that by choosing the parameters appropriately and by choosing H such that it is 
a parity-check matrix of a permuted generalized ( U , U + V) code, we will be able to solve eH T = s 
by using the underlying generalized (U, U + V) structure in a regime of parameters where solving 
this problem for generic linear codes is thought to be hard. Moreover, by a suitable rejection 
technique wc will show how the decoding algorithm using the trapdoor can be made oblivious of 
the underlying trapdoor. This is the preimage sampling condition of the previous definition.This 
mimics in a sense what has been achieved in the lattice setting of [GPV08] where the inversion 
algorithm is oblivious to the particular geometry of the trapdoor basis. Similarly to what has 
been achieved in [GPV08], we will also show that our construction based on permuted generalized 
( U , U + V) codes also verifies the domain sampling condition. 

Under the assumption that the Distinguishing and Decoding Problems are hard, we could have 
shown that the coding theoretic function /h that we consider here is one way. This would show 
that our code-based construction is a preimage sampleable function. However, the proof technique 
for showing the security of the signature scheme based on a preimage sampleable function given 
in [GPV08] relies on a stronger version of a preimage sampleable function, it should namely also 
be collision resistant. Our code-based construction will not meet this condition for the particular 
choice we will make for our scheme. This comes from the fact that we will focus on a ternary 
alphabet q = 3 and very large values of w. We will proceed in a slightly different way in our 
case. We namely give a security reduction relying on the assumptions that the Distinguishing and 
Decoding Problems are hard and on the preimage sampling property on one hand and the domain 
sampling property on the other hand. The preimage sampling condition of the previous definition 
is here to ensure that signatures which are produced do not leak any information whereas the 
domain sampling condition may seem more surprising. As we will see, it naturally appears in the 
security reduction as it enables to inject a hard instance of the Decoding Problem to which we 
reduce. 

4 Inverting the Syndrome Function 

This section is devoted to the inversion of /h- It amounts to solve the following problem 

Problem 1 (Syndrome Decoding with fixed weight). Given H £ F<ra b x " ) s g pn fc , and an integer 
w, find egFJ such that eH T = s and |e| = w. 

Here 

— we recall for which interval [w - , u> + ] of values for w we may hope to invert /h on any possible 
output; 

— we recall in which interval of values [w^ sy , to^y] C [■u: _ ,w; + ] it is easy to invert /h for any 
parity-check matrix H without using any trapdoor: this is the well-known Prange decoder; 

— we then explain how in the particular case of a generalized ( U , U + V) code we can invert 
/h by tweaking the Prange decoder for a significantly larger range [[uy v , Wy V ] of w than 
I^easy’^easyl- This is the key that shows how to exploit the underlying (U,U + V) structure 
as a trapdoor for inverting /h- 

Any solver of Problem 1 will be called decoding algorithm, whatever is the weight w. For small 
weights there is at most one solution and it relates to error-correction. For larger weight we may 
have exponentially many solutions and the problem relates to source-distortion theory in which it 
is also referred to as decoding. 


4.1 Generic Solutions 

Surjective Domain of the Syndrome Function The issue is here for which value of w we 
may expect that fu is surjective. This clearly implies that S w | > q n ~ k . In other words we have 
the following simple fact. 
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Fact 1 /// H is surjective, then necessarily w € 
w~ = min ■|u; £ [0, n], 
w + = max l.wG [0, n], 


[w , u> + ] with 

(« -1 r >?"-*} 


For a fixed rate R = k/n, it is readily verified that the asymptotic (in n ) behaviour of w and 
w+ is given by 


w 

n 


n 



n 


g-(l-R) + o( 1) 

9g (1 -R) + o(l) if R < log 9 

1 otherwise. 


( 2 ) 

( 3 ) 

( 4 ) 


where h q (x) = —(1 — x) log 9 (l — x) — x\og q (^^rj and 9 q its inverse ranging over [0, (q — 1 )/q], 

whereas g+ is its inverse ranging over [(q—l)/q, 1 ], but whose domain is restricted to [log 9 (g—1), 1]. 
This motivates the definition of w - and as 

oj-=g~(l-R) (5) 

w + =g+{l-R) if R< log 9 ( 6 ) 

w + = 1 otherwise. (7) 


The Gilbert-Varshamov distance corresponds to the smallest radius of a ball in F” centred 
around 0 whose volume is above q n ~ k . Since the volume of a ball is well approximated by the area 
of the sphere, w~ is actually very close (if not equal) to the Gilbert-Varshamov distance and u>~ 
is precisely the asymptotic relative Gilbert-Varshamov distance. A straightforward computation 
of the expected number of errors e of weight w such that eH T = s when H is random shows that 
we expect an exponential number of solutions when w/n lies in (w - ,oc + ): 

Proposition 1. Let n, k , w be integers with k < n and s £ F” -fc . The expected number of solutions 
of e H t = s in e of weight w when H is chosen uniformly at random in Fq" k ' >xn is given by: 

Ow-1 r 

qu—k 

When n tends to infinity but w and k are such that w = ton and k = Rn with ui fixed in (oj~ ,u> + ) 
and R fixed in (0, 1) , then this expected number of solutions behaves like e“ n ( 1 +°( 1 )) f or a certain 
a > 0. 


However, even the exponential number of solutions to our problem, coding theory has never 
come up with an efficient algorithm for finding a solution to this problem in the whole range 
(w - ,w + ). An efficient solution is only known for a subrange by using the Prange decoder. 


Easy Domain of the Syndrome Function The subrange of (u ,w + ) for which we know how 
to solve efficiently Problem 1 is given by the condition w/n £ [w^ sy > u^ sy ] where 

0J~ sy = ^(l-R) ( 8 ) 

wiay=^d(l -R) + R, (9) 
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where R = - . This is achieved by a sightly generalized version of the Prange decoder [Pra62]. 
To explain how it works, consider a linear code C over F g of length n and dimension k defined 
by a parity-check matrix H. We want to find for a given s and error e of weight w such that 
eH T = s. Roughly speaking, the subspace structure of C offers k bits of e that can be arbitrarily 
chosen and the other n — k bits are uniquely determined. Indeed, H is a full-rank matrix and it 
therefore contains an invertible submatrix A of size (n — k) x ( n — k ). We choose a set of positions 
/ of size n — k for which H restricted to these positions is a full rank matrix. For simplicity 
assume that this matrix is in the first n—k positions: H = (A|B). We look for an e of the form 
e = (e",e') where e' € and e" £ F^ -fc . We should therefore have s = eH T = e"A T + e'B T , 

that is e" = (s — e'B T )(A' 1 ) . In this way we can arbitrarily choose the error e' of length k. 
Therefore, if we look for an error of low weight, we can set these k positions to 0. For the remaining 
part we expect to get about ^-(n — k) positions that are non zero. On the other hand, to get 
an error of largest possible weight, the best strategy seems to set the k positions to non-zero 
values. We also get in this case in the remaining part about ^-^(n — k) positions that are non 
zero. The weights that are easily attainable by this strategy are therefore fr- (n — k) = noo~ asy 
and k + ^-^(n — k) = nw+ sy . We can get all intermediate weights in this interval by choosing the 
appropriate number of zeros in the k positions of e'. For reasons that will appear when we consider 
a decoder for generalized ( XJ , U + F)-codes it will be convenient to choose the weight of e' to be a 
random variable. In other words, the generalized Prange decoder looks as given in Algorithm 1. 


Algorithm 1 PrangeOne(H, s) — One iteration of the Prange decoder 

Parameters: q, n, k, V a distribution over fO, fc] _ 

Require: H € W { q ~ k)Xn , s £ F™~ k 

Ensure: eH T = s 
1: t V 

2: X <— InfoSet(H) 

3: x <-> {x € Fg |xx| = t} 

4: e <— PrangeStep(H, s,X, x) 

5: return e 

function InfoSet(H) — information set 
Require: H € F<" _ * )Xn 

Ensure: the returned value X is an information set of (H) x 

An information set of a k-dimensional code is a set ofk coordinate indices such that, on any kxn generator 
matrix, it indexes a non singular kx k submatrix. Its complement indexes a non singular (n — k)x(n— k) 
submatrix on any (n — k) x n parity check matrix. 

function PrangeStep(H, s,X, x) — Prange vector completion 
Require: H £ Fq"- fe ) xn , s £ F q ~ k , X an information set of (H) x , xgFJ 
Ensure: eH T = s and ex = xj 

P <— any n x n permutation matrix sending X on the last k coordinates 
(A | B) «- HP 
(0 | e') <— x 

e <— ((s-e'B T ) (A- 1 ) 1 ,^) P T 

return e 


t> A £F ( q n - k)x(n - k) 
>e' £F k 


This algorithm represents one step of the Prange decoder and it is called as many times are 
needed to produce an error e of weight w. The probability distribution of the weights of the e’s 
output by this function is readily seen to be given by 

Proposition 2. When H is chosen uniformly at random in F^ 1 k ' >xn and s uniformly at random 
in FJp fe , we can write the weight of the e ’s output by PrangeOne(H, s) as 

|e| =S + T 
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where S and T are independent random variables, S £ [0, n — fc], T £ [0, &], S is the Hamming 
weight of a vector that is uniformly distributed over F” -fc and P(T = t) = 2?(t). The distribution 
of |e| is given by 

w ( n ~ k )(a - l) w ~ t 

P(|e| =w) = J2 an - k -( 10 ) 

t =o q 

E (|e|) =V+ -(n-k) =V + nujf asy , (11) 

where V = Y/,t=o 

From this proposition, we deduce immediately that any weight w in [ 04 ^ 71 , w+, sy 7 i] can be 
reached by this Prange decoder with a probabilistic polynomial time algorithm that uses a dis¬ 
tribution V such that V = w — ui~ asy n. It will be helpful in what follows to be able to choose a 
probability distribution V as this gives a rather large degree of freedom in the distribution of |e| 
that will come come very handy to simulate an output distribution that is uniform over the words 
of weight w in the generalized (U,U + V) code decoder that we will consider in what follows. 

To summarize this discussion we have shown that when we want to build a code-based signature 
scheme, w has to verify w~ < w < w + to ensure that /h is surjective but with an expected expo¬ 
nential number of solutions for a given syndrome (see Proposition 1 ). However, in a cryptographic 
setting w/n cannot lie in [w^y,C [w“,o; + ] otherwise anybody that uses the generalized 
Prange algorithm would be able to invert /h- All of this is summarized in Figure 2 where we draw 
the above different areas asymptotically in n of w/n when k/n is fixed. 


w/n 


Fig. 2. Areas of relative signature distances. 



Enlarging the Easy Domain [if easy 5 ^isyl Inverting the syndrome function /h is the basic 
problem upon which all code-based cryptography relies. This problem has been studied for a long 
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time for weights w < w~ asy and despite many efforts the best algorithms [Ste 88 , Dum91, Bar97, 
MMT11, BJMM12, M015, DT17] for solving this problem are all exponential. In other words, after 
a thorough fifty years of research, none of those algorithms came up with a polynomial complexity 
for weights w < w~ asy . Furthermore, by adapting all the previous algorithms beyond this point we 
observe for them the same behaviour: they are all polynomial in the range of weights [w^y, u>+ sy ] 
and become exponential once again when w > w^ asy . Therefore, it seems to be an hard problem 
to enlarge the range where inverting /h is easy. In the following subsection we present a trapdoor 
on the matrices H which enables to invert in polynomial time /h by tweaking the Prange decoder 
on a larger range than [w" asy , w+ sy ]. 

4.2 Solution with Trapdoor 

Let us introduce the family of codes (this is the trapdoor) that we consider to invert /h- As we 
will see in what follows, this family comes with a simple algorithm which enables to invert /h with 
errors of weight which belongs to [K} v ,iz;J v ] C [w",u; + ] but with [w~ sy , w+ asy j C [wu v ,wj v ]. 
We summarize this situation in Figure 3. 


easy with (LI.U+V) trapdoor 


d 

0 


hard 


w 


uv 


w 


easy 


easy 


hard 


► w 


Fig. 3. Hardness of (U,U+V) Decoding 


Definition 2 (Generalized admissible (U,U + F)-codes). Let n be an integer and four diag¬ 
onal matrices Di, • • • , D 4 over F 9 such that: 

D = (I ? 1 I ? 2 j is invertible and\/i £ [l,n/2], Di(i,i)D 3 (i,i) 7 ^ 0 (12) 

\D 3 D 4 y 

Let U, V be linear q-ary codes of length n/ 2 and dimension ku , ky ■ We define the subset o/F 

(t/Di + FD2, LTD 3 + FD 4 ) ={(uD! + vD 2 , uD 3 + vD 4 ) such that u £ U and vef} 

which is a linear code of length n and dimension k = ku + ky- A parity-check matrix of such a 
code is given by 

/H[/D 4 M -H[/D 2 M\ 

^HyDjM —HyD 4 M J 

where 

M=(D 1 D 4 ^D 3 D 2 )- 1 , 

H( £ p( n / 2 -W)xn/2 ( reS p pj v , g fe v)xn/ 2 j ^ ^ parity-check matrix of U (resp. V). 

For a sake of simplicity in the description of the algorithm to invert /h when H is a parity- 
check matrix of a generalized admissible (U, U + V) code, we will restrict our study to the case of 
D 3 = D 3 = D 4 = l n / 2 an d D 2 = 0 n / 2 which corresponds to standard (t/, U + F)-codes. However, 
all our discussion (especially this subsection and the following) can be generalized. 

It turns out now that in the case of a (C7, U + V) code, a simple tweak of the Prange decoder will 
be able to reach relative weights w/n outside the “easy” region [w^y, w+ sy ]. Let us first explain 
how the idea works in the case of a ([/, U + F)-code. It exploits the fundamental leverage of the 
Prange decoder : it consists in choosing the error e satisfying eH T = s as we want in k positions 
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when the code that we decode is of dimension k. When we want an error of low weight, we put 
zeroes on those positions, whereas if we want an error of large weight, we put non-zero values. 
This idea can be adapted in the case of a (U, U + H)-code. The parity-check matrix of such a code 
can be chosen to be 


H = 


(Hu o 

\-Hy H y) 


(13) 


Solving eH T = s with H as (13) amounts to solve 


e uH\j — s v (14) 

e v H J v = sy (15) 


where we split s = (su,s v ) and we have e = ( ey,ey + e v ). Performing the two decoding (14) 
and (15) independently with Prange algorithm gains nothing. However if we first solve (15) with 
Prange algorithm, and then seek a solution of (15) which properly depends of ey we increase 
the easy range of weights accessible for e. It then turns out that the range [w(( v ,wj v ] of relative 
weights w/n for which the (U,U + V) decoder is easy is larger than the generic easy domain 
[ w easy> w easy]> see Fig- 3. This will provides an advantage to the trapdoor owner. 


Tweaking the Prange decoder For Reaching Low Weights. In this case, we first look, with the help 
of the Prange decoder, for the e v of lowest possible weight satisfying (15). We can attain a weight 
for ey of 2^-i (n/2 — ky). In a second step we have to find ey satisfying (14). The point is now that 
we will not look for ey of lowest possible weight satisfying (14), we will instead use the knowledge 
of ey to do better. To understand this point, what we want to do now is to minimize the weight 
of e = (ey, ey + ey) given ey and eyiljj = s y. The strategy of Prange is to choose the value of 
ey in ky positions (the “information set”) and to complete the rest of the values with the help of 
the equation eyH[. = s y. For this, we consider the set of positions i for which ey(i) = 0. Without 
loss of generality, we may assume that these are the first n/2 — (n/2 — ky) = ^ + ^^fcy 

positions of ey and that ey and ey split as 


e U — ( e Ui e u) 
e v = (0, e'y) 


where e'y is in FI 5 " 1 " q v and e'y, 
therefore of the form 


-v 


are in F 


2fi(n/2-fcv) 

9 


The error e we are looking for is 


e 


( e t/ > e U 7 e U 7 e U + e V ) • 


This is also represented in Figure 4. 


£ + ^(n/2-ky) 

i -- > 


ey = 

0 

e v 



e u 

e u 

e u 

// , // 
e u + e v 


Fig. 4. The form of the errors ey and e 


To get an error of smallest weight for e it really makes sense to choose as many zeros as we 
can in e'y-. they are doubled in e. Choosing a position of e'/j to be 0 is less helpful, since the 
corresponding position in e'y + e'y is non-zero in this case. There are two cases to consider: 

Case 1 '• ky < //^ + '‘/ l ky. Here we choose an information set for the Prange decoder among 
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the positions in e' v and ask in the Prange step applied to Hy and s u that eu is zero on these 
positions. Here, the expected weight of e is given by 

E(|e|) = 2 1 — (f + — ky - ku) + 2^ (^ {n/2 - k v )) 

q \2q q J q \ q J 

r) ^ 1 7 nr ^ 

— - n — 2 - kjj (16) 

q q 


Case 2 : ku > ff -\- — -ky. In the Prange step applied to Hy and s^, we choose the information 

zq q 

set for U, so that it contains all the ^ + ^-ky first positions (i.e. those of e' c/ ) and choose eu to 
be zero on these positions. In this case, we have 


E(|e|) 


, n o-l, „ q — 1 . . , . 

ku — - - ky + 2 -(n /2 — ku) 

2 q q q 

(l -^-)n-^ku-—k v 
V 2 qj q q 


(17) 


It is readily seen that if we want to minimize the expected weight of |e| for a given dimension 
k = ku + ky of the (U, U + V)-code, it is always better to use the first strategy. From this it can 
be verified that as long k < ^ the best we can do is to choose ku = k and ky = 0. This leads in 
this regime to an expected weight given by (16) with ku = k : 

E(|e|) = l?i — 2-^—!fc. (18) 

q q 


For larger values of k, it can be verified that the best we can do is to enforce ku = ^ + a -^ L ky, 
which together with the relation k = ku + ky implies that 


n (q — 1 )k 

~ 2(2q — 1 ) + 2q-l ' 


(19) 


Plugging this expression into (16) leads to 


E(|e|) 


2(g ~ l) 2 

( 2 q - 1 )q 


(n 


k). 


( 20 ) 


Tweaking the Prange Decoder for Reaching Large Weights. When q = 2, small and large weights 
play a symmetrical role. This is not the case anymore for q > 3. Indeed, in this case the best 
Prange strategy does not take into account the weight of ey. Assume here that we have chosen 
any e satisfying eyH T = sy. How does the second decoding for eu take into account this? We 
want here to find eu that maximizes the weight of ( eu,eu + ey) given that eyH T = s u- Recall 
that the Prange strategy consists in choosing the value of eu in ku positions and to complete 
the rest of the values with the help of the equation e^H T = s u- Here for any position i, it is 
always possible to choose eu(i) such that both eu(i) and ej/(z) + ey(i) are non-zero. By using 
this strategy, we see that the expected weight of e becomes fro q > 3 


E(|e|) = 2ku + (n — 2ku )~—- 

q 

q — 1 2 ku 

= - n H- 

q q 


( 21 ) 


The best choice for ku is to take ku = k up to the point where + y = n, that is k = n/2. 
for larger values of k we choose ku = n/2 and ky = k — ku- 
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Trapdoor Pseudocode. The decoder for (U, U + V) codes that we just described when we want to 
reach large weights is given in details in Algorithm 2. We name it DecodeUV(-) and it parses 
its first argument H as a parity check matrix of some (U, U + V) code. The decoding consists in 
two elementary steps, each using the basic Prange information set decoding. The second step is 
repeated in order to ensure that the final weight is w, the target for a valid signature. As it is 
described this algorithm may have a biased output and is susceptible to leak information on the 
secret key. Avoiding this is mandatory to obtain a proof in the GPV model. In the sequel we will 
show how to adapt the algorithm, in particular by adding rejection sampling, to avoid this leakage 
and complete the security proof. 


Algorithm 2 DecodeUV(H, s) — Main signature building block 

Parameters: n, k, fcpy, ky = k — ku,w _ 

Require: H = ( ^ ° ) € F<"“ fc)Xn , H u € F ^ /2 "' sc/)xn/2 , s = ( Su , s v ) € F"“ fc , s v € F” /2_fcu 
V —t±v J 

Ensure: eH T = s and |e| = w > implicit: Hy E Fg n,/2 k v ) Xn / 2 an ^ ^ Fg / 2 ~ kv 

1: ey <— Dy(Hy,Sy) 

2: repeat 

3: ep <- Dy(H[f,S[i,ey) 

4: ef- (ep/, ep/ + ey) 

5: until |e| = w 
6: return e 

function Dy(Hy,Sy) 

Ty «— InfoSet(Hv) > InfoSet returns an information set 

return PrangeStep(Hv, ,Ty, 0) 
function Dpp(Hp/, sp/,ey) 

lu <— InfoSet(Hpp) > InfoSet returns an information set 

xp/ 4 .— 5 {x E Fg ^ 2 | Supp(x) = Xu and xp = (ey);, I =Iu fl Supp(ey/)} 
return PrangeStep(Hpp, sp/,Tpp,xp/) 


All of this discussion is summarized in Figure 5 where we draw and wjj V which are the 
highest and the smallest relative distances that our decoder can reach asymptotically in n when 
k/n is fixed and q = 3. 


5 Obtaining a Preimage Sampleable Scheme 

We restrict here our study to the case q = 3 but it can be generalized to larger values of q. 


5.1 Achieving the Domain Sampling with the Generalized Admissible (U,U + V) 
Code Family 

We will denote in the rest of the article by H p k the random matrix chosen as the public parity-check 
matrix of our scheme <S co de which is defined in §3.1. Such a public-key is defined as: 


H pk 


SH sk P 


with 


a /Hp/D 4 M —Hp/D 2 M\ 
sk l v H y D 3 M-HyDiMj 


where Dp,-- - ,D4 are four diagonal matrices which verify (12) and M=(DpD4 — D3D2) -1 , S 
is chosen uniformly at random among the invertible ternary matrices of size (n — k) x (n — k), 
H u is chosen uniformly at random among the ternary matrices of size ( 11 / 2 — kjj ) x nj 2 , Hy is 
chosen uniformly at random among the ternary matrices of size (n/2 — ky) x n/2 and P is chosen 
uniformly at random among the permutation matrices of size n x n. Thanks to the knowledge of 
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Fig. 5. Areas of relative signature distances with our trapdoor when q = 3 
w/n 



H s k which is a parity-check matrix of an admissible generalized ({/, U + P)-code and the previous 
algorithm we can invert fn ak on any input. 

Let us give now the following definition which enables to better understand the structure of 
admissible generalized (I/, U + P)-codes. 

Definition 3. (number of V blocks of type I). In a generalized (U, U + V) code of length n 
associated to the 4-tuple of diagonal matrices of size n/2 (D 1? D 2 , D 3 , D 4 ), the number ofV blocks 
of type I, which we denote by ni, is defined by: 

ni = |{1 < i < n/2 : D 2 (i, i)D 4 (i, i) = 0}| . 


Remark 3. ni can be viewed as the number of positions in which a codeword of the form (vD 2 , vD 4 ) 
is necessarily equal to 0: this comes from the fact that on a position where either D 2 (i,i) = 0 or 

D 4 (i, i) = 0, the other one is necessarily different from 0 as 


we also have 


D ; 


is invertible. In other words 


nj = |{1<«< n/2 : D 2 (j, i) = 0}| + |{1 < i < n/2 : D 4 (?,*) = 0}| . 


The random structure of such matrices H p ]< (by choosing uniformly at random matrices Hy 
and Hy) makes that the syndromes associated to matrices H p y are indistinguishable in a very 
strong sense from random syndromes as the following proposition shows. In this way, our scheme 
achieves the Domain Sampling property of Definition 1. 

Proposition 3. Let be the distribution of the syndromes eH T when e is drawn uniformly 
at random among the ternary vectors of weight w and IA be the uniform distribution over the 
syndrome space We have 


E Hpk (p(D^\U)) < 
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with 


+E 


a/2-fc v 




E 


+ 3 

3 n/2-nj-j 


n/2 — ku 


E 


\ p=0 

\iT+p=0 mod 2 

nA 2 « /'("«r-7)2“ 


,'W n/2 ^H±£ 
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/i 


/2 — ni\ f n/2 — m — h\ ( n/2 — m — j 


j ~ h 


2 m 

w — j — h — 2p 


) w-j — h — 2p 


( 22 ) 


Remark 4- In the paradigm of our code-based signatures we have w greater than the Gilbert- 
Varshamov bound, which gives 3 n-fc <C 2“’(" i ) and for the set of parameters we present in §9, 
lo g2( e ) = -1034. 

The proof of this proposition is given in Appendix §B and relies among other thing on the 
following lemma which is a variation of the leftover hash lemma (see [BDK+11]) and which can 
be expressed as follows. 

Lemma 1. Consider a finite family TL = of functions from a finite set E to a finite set 

F. Denote by £ the bias of the collision probability, i.e. the quantity such that 

Ph,e,e'(h(e) = h{e')) = E(l + e) 

where h is drawn uniformly at random in %, e and e' are drawn uniformly at random in E. Let 
U be the uniform distribution over F and V(h) be the distribution of the outputs h(e) when e is 
chosen uniformly at random in E. We have 

E h {p(V{h),U)} < 

Remark 5. In the leftover hash lemma, there is the additional assumption that % is a universal 
family of hash functions, meaning that for any e and e' distinct in F, we have P; l (h(e) = h(e')) = 
r^y. This assumption allows to have a general bound on the bias s. In our case, where the h' s 
are hash functions defined as h(e) = eHj k , % does not form a universal family of hash functions 

(essentially because the distribution of the Hp k ’s is not the uniform distribution over F^ ra-fe ) x ”). 
However in our case we can still bound e by a direct computation. This lemma is proved in 
Appendix B. 

5.2 Achieving a Uniformly Distributed Output 

To be a one-way preimage sampleable function, we have to enforce that the outputs are very close 
to be uniformly distributed over S w . Algorithm 2 using directly the Prange decoder, does not 
meet this property. However, by changing it slightly, we will achieve this task by still keeping the 
property to output errors of weight w for which it is hard to solve the decoding problem for this 
weight. We summarize the situation in Figure 6. 

The template remains the same but the functions Dy and Dy will be modified to included 
some rejection sampling. The Prange decoders that are used here achieve in a natural way some 
kind of uniformity on the output. 

Definition 4 (uniform decoder). A V-decoder D(Hy, sy) taking as input Hy in¥^ l ^ 2 ~ kv ' >xn , 
sy in Fg^ 2 fcv and outputting ey in Fg^“ satisfying eyHy T = sy is uniform with respect to Hy if 
P(ey = D(Hy, Sy)) is just a function of |e| when Sy is chosen uniformly at random in Fg^ 2 kv . 
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easy with (U,U+V) trapdoor 


d 

0 


no leakage with (U,U + V) trapdoor 


hard 


easy 


'■'easy 

i — 


hard 


Fig. 6. Hardness of (U,U-|-V) Decoding with no leakage of signature 


The Prange decoder used for producing ejj does not have such a nice property, its output depends 
in a crucial way on ey. The following notation will be useful. 

Notation 1 For x £ Fg with n even, letxjj and xy be in F ^ 2 such that x = (xy, xjj + xy). We 
also denote by £ i(x) and £_i(x) the following quantities: 

£i(x) = |{i | Xy(j) = l,xu(i) ± 1}| 

^-i( x ) = |{* I x v/*) = —1, x t/(*) 7 ^ -1}| • 

Definition 5 (weakly uniform decoder). Let D be a decoder with input H u in p^ l / 2 ~ k u) xn ^ 
Sfj in Fg ^ 2 ku and ey in Fg ^ 2 and that outputs an e = ( eu,eu + ey) in Fg such that ej/H[f T = 

sjj. Let /(e)=(|ey|i, |ey|_i,fi(e),f?_i(e), |e|). D is weakly uniform with respect to Hj/ */P(e = 
D(H[/, S[ 7 , ey)) is a function of /(e) when s jj and ey are chosen uniformly at random in their 
range. 

Our 17-decoder based on the Prange decoder meets this property in a natural way. With such 
decoders, it is rather easy to obtain a uniformly distributed output for the combined (JJ, U + V)- 
decoder. The uniform distribution of our decoder follows at once from 

Lemma 2 . Let e = (+ ey) be the output of Algorithm 2 when sy and s jj were chosen 
uniformly at random in Fg^“ kv and Fg ^ 2 ku respectively. Assume that D jj is weakly uniform 
whereas Dy is uniform. Let 

unif _ ( unif unif , unih 

e — {Vjj , ejy -t- Vy j 

be a uniformly distributed error of weight w. Let 

3 (ey)=(|ey|i, |ey|_i) and h(e)=(£ i(e),£_i(e), |e|). 

If p{\ev\, K mf \) = 0 and p (M e ) = z \g( e v) =y) = p (h(e un,f ) = z\g(e™ tf ) = y ) for any possible y 
and z, then 

p(e,e unif ) = 0. 

Proof. Let x be in S w , Hy be the matrix used in D u and /(e) =(g(e), h(e)). We have 
Pst J , Sv (e = x) =P Bt 7 ,s v (e = x|/(e) = (y, z))P SUtSv {f(e) = (■ y,z )), 
where /(x) = (y, z). Observe now that 

P S[/ ,s v (/(e) = (y,z)) = p W,s v (h(e) = z\g(e v ) = y)¥ Sv {g(e v ) = y ) 

= P(/i(e unif ) = ,~| 5 (er f ) = 2 /)P(ff(er f ) = y). 

where P Sv (^(e) = y) = P(g(e umf ) = y) because ey and e™ lf have the same distribution. This 
follows from th.6 uniformity of Dy and. p(|b\/| ,|er f |) = 0. Since ey is uniformly distributed 
conditioned on its weight, we have 


Pst/,sv( e = x l/(e) = (y,z)) =Pst/,ev( e = Dt/(H[/, Sy, ey)|/(e) = /(x)) = P(e umf = x|/(e umf ) = /(x)). 
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where the last equality follows from the weak uniformity of D jj- Using all these equations, we get 

P sir , s .(e = x) = P(e unif = x|/(e unif ) = (y, z))P(/ l (e unif ) = z^e^) = y)P( fl (e^ if ) = y) 

= P ( e unif = x) 

which concludes the proof. 

The rationale of our algorithm is then to ensure that D;y and Dy behave as required by Lemma 
2 by some mild rejection sampling. We summarize how we perform the decoding in Figure 7. It 
relies here among other thing on the crucial notion of information set in the Prange algorithm. The 
rejection sampling will be over the weight of ey which is the output of Dy and over £i(e),£_i(e) 
which are functions of the output of Djj. 



Fig. 7. Summary of the decoding 


The pseudo-code of Dy is given in Algorithm 3. The algorithm for the [/-decoder is slightly 


Algorithm 3 Dy the U-decoder outputting an ey such that eyHjj = sy. 

Parameters: n, ky n/2 is the length of the code we decode, fcy its dimension 
Vy a distribution over JO, fcy] 

(r y)j rejection sampling vector, taking values in [0,1], ACCEPT(i, r) is true with probability n 

function Dy (Hy, Sy) 
repeat 

1 «— iNFOSET(Hy) 
t ^ Dy 

x { x G F*' | |X| = t} 
ey <— PR.ANGESTEP(Hy, Sy,T, x) 
until ACCEPT(|ey|, ry) 

return ey 


more involved and is given in Algorithm 4. By setting up the rejection vector r jj and ry appropri¬ 
ately, we can reach the uniform distribution over S w in Algorithm 2. Rejection sampling on Dy 
is here to meet the first condition in Lemma 2: p(|ey|, |ey nlf |) = 0, whereas rejection sampling on 
Du ensures that the conditional distribution of h(e) given <?(e) is the same as the distribution of 
h(e unli ) given <?(e umf ). This is shown in Theorem 1. 


Theorem 1. Let e“™^ 
any i £ {0, • • • , n/2} as 


be a vector chosen uniformly at random over S w . 


r v {i) 


A 


1 qr%) 

My gi (z) 


Let ry be defined as for 
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Algorithm 4 D u the (7-decoder outputting an eu such that eyHj, = s u- 
Parameters: n, ku n/2 is the length of the code we decode, ku its dimension 

<Pu) o<ti<n /2 a family of probability distribution over {(Ji, J 1)10 < j r < ti ,0 < j -1 < t-i ,0 < 

0<t_i<n/2 

ji + 3 -1 < k u } 

(r r T _1 ) o<ti<n /2 a family of rejection sampling vectors, taking values in [0,1], ACCEPt((«, j), r) is true 

0 <£_ i <n /2 

with probability r(i,j) 

function D(/(Hy, su, ey) 
ti,t-i ■£- |ev|i,|ev|-i 

repeat 

X <— InfoSetW(H[/, ev,Ji,J-i) > InfoSetW() is defined below 

xp{x£ Fg ^ 2 | Supp(x) = X and xj = (ey)j, 7 = 10 Supp(ey)} 
eu <— PrangeStep(H( 7 ,S[/,X,x) 

£i,£-i «- |{* | ev(i) = 1, eu(i) =£ 1}| , |{« | e v (i) = -1 ,e v (i) A —1}| 
until ACCEPT ru) 
return eu 

function 1nfoSetW(H, e, j\, ji— i) — weighted information set 

Require: H £ Fg" fc ' )xn , e £ F 3 , ji,j-i positive integers such that ji +j -1 < k 

Ensure: the returned value X is an information set of (H}^ such that the number of positions i in X for 
which e, = 6 is equal to j b for b in {1,-1}. 


with qi m \i) = P(|e“ m/ | = i), q\(i) = P(|Dy(Hy,sy)| = i), and Mff = sup Ql , . The no- 

0<i<n/2 

rejection probability vector r^ -4 ' 1 is chosen for any i £ [0, ti] and j £ [0, t_i] as 

qA 1 q2 mf (h j\h,t-l) 
rU % ' J M™(ti,t-i) q 2 (i, j\ti,t-!) 


with 


q^ihiltut-i) = P (£1 (e umf ) = = j 


A 


=P(£i(e) =i,t n x{e) =j \ |§y| 6 

unif/ • • 1 t , \ 

M™(tl,t-l) = SUP TT 7TT , -J 

0<i<t_i 


K mf \b =tb, b 

t b , 6 = -1,1) 



where e = (eu, eu +ey"*7) with eu being the output o/PrangeStep(H^, sj/,X, x) m Algorithm 4 
and (Hj/, su , e“'^) is its input, where s u is uniformly distributed. Then ifDu is weakly uniform 
and Dy uniform, we have that the output e of Algorithm 2 satisfies 

p(e, e unif ) = 0. 


To have an efficient algorithm, it is essential that M rs ( 1) and the M rs (ti, t_i)’s are as small 
as possible. These numbers represent the average number of times the repeat loops in those algo¬ 
rithms are performed. This is achieved by a particular choice of the parameters w, ku, fcy. Roughly 
speaking, the idea for having M rs ( 1) and the M rs (t\, t_i)’s to be small is that the output e"y of 
PRANGESTEP(Hy, sy,Z, x) in Algorithm 3 and the output eu of PrangeStep(H[/, &u,T, x) in 
Algorithm 4 satisfy 

E(|eV|) = E(|er f |) and E(\e~u\) = E(|e^ nif |). 


We also require that E(|(ejp,e[/ + ey nlf )|) = E(|e unlf |). Set a by 


(l-a)fcy =E(T), 
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where T is a random variable distributed like Dy. These requirements lead to the following choice 
of parameters 


ky 



■ ■ 


L2 




(23) 

(24) 

(25) 


Figure 8 gives the relative error weight u) = w/n as a function of R for a = 0.7 of Algorithm 
2 with rejection sampling and the previous choice of parameters asymptotically in n when k/n is 
fixed. 


Fig. 8. Areas of relative signature distances with rejection sampling when q = 3 
w/n 



6 Security Proof 

We give in this section a security proof of the signature scheme S co & e . This proof is extremely 
close in spirit of the security proof of [GPV08]. However we do not reduce the security of <S coc ie to 
a problem of collision as the scheme imposes to be in a range of parameters where this problem 
becomes easy (for more details see §7.3) . 

Furthermore, we would like to stress that this section is a slightly different version of the 
unpublished paper [DST17b, Section 3]. We added in order to have a self-contain paper and not 
to refer many times to results of [DST17b]. 


6.1 Basic Tools 

Basic Definitions. Recall that the statistical distance p is defined in Section §2. We will need the 
following well known property for the statistical distance which can be easily proved by induction. 
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Proposition 4 . Let (T>®, . .. ,D° n ) and (V\,... ,T> *) be two n-tuples of discrete probability distri¬ 
butions where T >° and V\ are distributed over a same space £j. We have for all positive integers 
n: 

n 

i= 1 

A distinguisher between two distributions T>° and V 1 over the same space £ is a randomized 
algorithm which takes as input an element of £ that follows the distribution V° or V 1 and outputs 
b G {0,1}. It is characterized by its advantage: 

Adv v ' v (.4) = P^x>o (-4(0 outputs 1) — P^pi (-4(0 outputs 1). 

We call this quantity the advantage of A against V° and V 1 . 

Definition 6 (Computational Distance and Indistinguishability). The computational dis¬ 
tance between two distributions V° and V 1 in time t. is: 

p c ( t ) = max | Adv v ' D (A) j 

where |M| denotes the running time of A on its inputs. 

The ensembles V° = (2?°) and V 1 = (Vf ) are computationally indistinguishable in time ( t n ) 
if their computational distance in time ( t n ) is negligible in n. 

In other words, the computational distance is the best advantage that any adversary could get in 
bounded time. 


Digital Signature and Games. Let us recall the concept of signature schemes, the security 
model that will be considered in the following and to recall in this context the paradigm of games 
in which we give a security proof of our scheme. 

Definition 7 (Signature Scheme). A signature scheme S is a triple of algorithms Gen, Sgn, 
and Vrfy which are defined as: 

— The key generation algorithm Gen is a probabilistic algorithm which given 1 A , where A is the 
security parameter, outputs a pair of matching public and private keys ( pk , sk); 

— The signing algorithm is probabilistic and takes as input a message m £ {0,1}* to be signed 
and returns a signature a = Sgn sk (m); 

— The verification algorithm takes as input a message m and a signature a. It returns Vrf y pk (m, a) 
which is 1 if the signature is accepted and 0 otherwise. It is required that Vrf y pk (m, a) = 1 if 
a = Sgn sk (m). 

For this kind of scheme, one of the strongest security notion is existential unforgeability under an 
adaptive chosen message attack (EUF-CMA). In this model the adversary has access to all signa¬ 
tures of its choice and its goal is to produce a valid forgery. A valid forgery is a message/signature 
pair (m, cr) such that Vrfy pk (m,cr) = 1 whereas the signature of m has never been requested 
by the forger. More precisely, the following definition gives the EUF-CMA security of a signature 
scheme: 

Definition 8 (EUF-CMA Security). Let S be a signature scheme. 

A forger A is a (t, ghash, Vsign, e)-adversary in EUF-CMA against S if after at most Vhash queries 
to the hash oracle, q s i gn signatures queries and t working time, it outputs a valid forgery with 
probability at least e. We define the EUF-CMA success probability against S as: 

Succg' JF ~ CMA (t, Vhash, Vsign) = max (e\it exists a (t, q hash , q sign , s)-adversary). 

The signature scheme S is said to be (t, c/hash, Vsign) -secure in EUF-CMA if the above success 
probability is a negligible function of the security parameter A. 
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The Game Associated to Our Code-Based Signature Scheme. The modern approach to 
prove the security of cryptographic schemes is to relate the security of its primitives to well-known 
problems that are believed to be hard by proving that breaking the cryptographic primitives 
provides a mean to break one of these hard problems. In our case, the security of the signature 
scheme is defined as a game with an adversary that has access to hash and sign oracles. It will be 
helpful here to be more formal and to define more precisely the games we will consider. They are 
games between two players, an adversary and a challenger. In a game G, the challenger executes 
three kind of procedures: 

— an initialization procedure Initialize which is called once at the beginning of the game. 

— oracle procedures which can be requested at the will of the adversary. In our case, there will 
be two, Hash and Sign. The adversary A which is an algorithm may call Hash at most <7hash 
times and Sign at most < 7 s i gn times. 

— a final procedure Finalize which is executed once A has terminated. The output of A is given 
as input to this procedure. 

The output of the game G, which is denoted G(A), is the output of the finalization procedure 
(which is a bit b G {0,1}). The game G with A is said to be successful if G{A) = 1 . The standard 
approach for obtaining a security proof in a certain model is to construct a sequence of games such 
that the success of the first game with an adversary A is exactly the success against the model of 
security, the difference of the probability of success between two consecutive games is negligible 
until the final game where the probability of success is the probability for A to break one of the 
problems which is supposed to be hard. In this way, no adversary can break the claim of security 
with non-negligible success unless it breaks one of the problems that are supposed to be hard. 

Definition 9 (challenger procedures in the EUF-CMA Game). The challenger procedures 
for the EUF-CMA Game corresponding to S co d e are defined as: 


proc Initialize(A) 

proc Hash(m,r) 

proc Sign(m) 

proc Finalize(m, e, r) 

(pk, sk) <r- Gen(l A ) 
H p k t— pk 
(H sk , P, S) t— sk 

return H pk 

return Hash(m, r) 

r {0,1} A ° 

s G- Hash(m, r) 
e t— -D Hsk ,u)(s(S _1 ) T ) 

return (eP, r) 

s G- Hash(m, r) 
return 

eHp k = sA|e| = w 


6.2 Code-Based Problems 

We introduce in this subsection the code-based problems that will be used in the security proof. 
The first is Decoding One Out of Many (DOOM) which was first considered in [JJ02] and later 
analysed in [Senll]. We will come back to the best known algorithms to solve this problem as a 
function of the distance w in §7. 

Problem 2. (DOOM - Decoding One Out of Many). Given H G fe ^ x ", si, • • • , sn € F^ _fc , and 
an integer w, find e G F” and i, 1 < i < N such that such that eH T = s, and |e| = w. 

Definition 10 (One-Wayness of DOOM). We define the success of an algorithm A against 
DOOM with the parameters n, k, N, w as: 

Succ^qqm (A) = P(M (H, si, • • • , s n) solution of DOOM) 

where H is chosen uniformly at random in F^" fe -* xra , the s ’s are chosen uniformly at random in 
F” -fc and the probability is taken over these choices of H, the s, ’s and the internal coins of A. 

The computational success in time t of breaking DOOM with the parameters n,k,N,w is then 
defined as: 


ri n,k,N,w (,\ f rr n.k.N.w / a\ | 

SucCjfooM u) = max |^ucc DO ’ OM {A)j 
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Another problem will appear in the security proof: distinguish random codes from a code drawn 
uniformly at random in the family used for public keys in the signature scheme. The public-keys, 
namely matrices H p k, follow a distribution over the parity-check matrices of size (n — ku — ky) xn 
which is described in §5.1. In the following we will denote by 2? pu b this distribution. On the other 
hand 2? ranc j will denote the uniform distribution over the parity-check matrices of all [n, fc]-codes 
with k = ku + ky. We will discuss about the difficulty of the task to distinguish 2? pub and £> ra nd 
in §8. Let us recall that the syndromes associated to matrices H p k are indistinguishable in a very 
strong sense from random syndromes as the proposition 3 (see §5.1) shows in the case of q = 3. 

6.3 EUF-CMA Security Proof 

This subsection is devoted to our security reduction and its proof. We give it in the case of 
<7 = 3. Let us first introduce some notations that will be used. We will denote by V w the output 
distribution of Algorithm 2. Furthermore, we will denote Algorithm 2 by for a secret 

key H s k. Recall that U w is the uniform distribution over S w (which is the set of words of weight w 
in Fg), I? pu b is the distribution of public keys, V lan d is the uniform distribution over parity-check 
matrices of all [n, /c]-codes and <S C ode is our signature scheme defined in §3.1 with the family of 
generalized admissible (U,U + U)-codes (Definition 2 in §4.2). 

Theorem 2 (Security Reduction). Let ghash (resp. q s i gn ) be the number of queries to the hash 
(resp. signing) oracle. We assume that Ao = A + 21og 2 (g s i gn ) where A is the security parameter of 
the signature scheme. We have in the random oracle model (ROM) for all time t: 

SuCCg^f MA {t, ghash, gsign) < 2SuCC^’q’§^ s1i ’ W (t c ) + Pc (X> rand , Ppub) (*c) 

. 1 _ 1 
T gsign P (Lty.. U w ) T 2 Aiash VT 

where t c =t + O (ghash • n 2 ) and e given in Proposition 3. 

Proof. Let A be a (t, g s i g n, ghash, c)-adversary in the EUF-CMA model against <S co de anc i let 
(H 0 ,Si,-- - ,s ghash ) be drawn uniformly at random among all instances of DOOM for parame¬ 
ters n, k, ghash) w. We stress here that syndromes s, are random and independent vectors of F 3 _fe . 
We write P (S§) to denote the probability of success for A of game Gj. Let 

Game 0 is the EUF-CMA game for <S co d e - 

Game 1 is identical to Game 0 unless the following failure event F occurs: there is a collision 
in a signature query (Le. two signatures queries for a same message m lead to the same salt r). 
By using the difference lemma (see for instance [Sho04, Lemma 1]) we get: 

P(S 0 ) <P(Si)+P(F). 

The following lemma (see A.2 for a proof) shows that in our case as Ao = A + 2 log 2 (g s i gn ), the 
probability of the event F is negligible. 

Lemma 3. For Ao = A + 2 log 2 (g s i gn ) we have: 

Game 2 is modified from Game 1 as follows: 


To each message m we associate a list L m containing 
gsign random elements of F 2 °. It is constructed the 
first time it is needed. The call r £ L m returns true if 
and only if r is in the list. The call L m .next() returns 
elements of L m sequentially. The list is large enough 
to satisfy all queries. 


proc Hash(m, r) 

proc Sign(m) 

if r G L m 
®m,r X 3 Sw 
return e m , r Hj k 
else 

j x- j + 1 

return s j 

r X— L m ,next() 
s X— Hash(m, r) 
e X— D H3t ,w( s (S _1 ) T ) 

return (eP,r) 
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The Hash procedure now creates the list L m if needed, then, if r € L m it returns e m r Hp k with 
e m ,r S w . Although we do not use it in this game, we remark that (e mr ,r) is a valid signature 
for m. The error value is stored. If r ^ L m it outputs one of s j of the instance (H 0 , Si,..., s ghash ) 
of the DOOM problem. The Sign procedure is unchanged, except for r which is now taken in L m . 
The global index j is set to 0 in proc Initialize. 

We can relate this game to the previous one through the following lemma. 

Lemma 4. 

P(Si) < P(< 5 , 2 ) + ^ sh \fe where e is given in Proposition 3. 

The proof of this lemma is given at the end of Appendix B and relies among other things on 
the following points: 

— Proposition 4; 

— Syndromes produced by matrices H p k with errors of weight w have average statistical distance 
from the uniform distribution over Fg _fc at most \\fe (see Proposition 3). 

Game 3 differs from Game 2 by changing in proc Sign calls “e -f- DH Ik , ro (s(S' 1 ) T )” by 
“e e m r ” and “return (eP,r)” by “return (e, r)”. Any signature (e, r) produced by proc Sign 
is valid. The error e is drawn according to the uniform distribution IA W while previously it was 
drawn according to Algorithm 2 distribution, that is T> w . By using Proposition 4 it follows that 

P(S 2 ) <V(S 3 ) + q siS nP(U w ,-D w ). 

Game 4 is the game where we replace the public matrix H p k by Ho- In this way we will force 
the adversary to build a solution of the DOOM problem. Here if a difference is detected between 
games it gives a distinguisher between distributions V r a.nd and Ppub'- 

P (Sa) < P ($4) + Pc OPpubi -Prand) (' t c ) • 

We show in appendix how to emulate the lists L m in such a way that list operations cost, 
including its construction, is at most linear in the security parameter A. Since A < n, it follows 
that the cost to a call to proc Hash cannot exceed 0(n 2 ) and the running time of the challenger 

is t c = t + O ((/hash * U. ) . 


Game 5 differs in the finalize procedure. 


We assume the forger outputs a valid signature (e, r) for the message 
m. The probability of success of Game 5 is the probability of the event 
“S 4 A(r^ m )”. 


If the forgery is valid, the message m has never been queried by Sign, and the adversary never 
had access to any element of the list L m . This way, the two events are independent and we get: 

P(S 5 ) = (1 — 2 _A °) 9Bign P (S 4 ). 

As we assumed A 0 = A + 21og 2 (g sign ) > log 2 (g;( ign ), we have: 


proc Finalize(m, e, r) 
s Hash(m, r) 
b eH pk = s A |e| = w 
return b A r ^ L m 


Qsign 


(1 - 2 _a °) 


Q'sign 


> 1 - 


^sign 


1 

> -. 
~ 2 


Therefore 

P(S 5 )> ^P(S 4 ). (26) 

The probability P (S 5 ) is then exactly the probability for A to output e. ; e S w such that ejH 0 T = s j 
for some j which gives 


P (S 5 )< Succ^ sh ’ W (tc)- 


(27) 
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(26) together with (27) imply that 

P(5 4 )<2- Succ%^’ w (t c ). 

This concludes the proof of Theorem 2 by combining this together with all the bounds obtained 
for each of the previous games. 

7 Hardness of Finding Errors of Large Weight with Prescribed 
Syndrome 

We consider the syndrome decoding problem 

Problem 3 (Syndrome Decoding). Given H G jp("- fe )xn, g ^ and an integer w, find e G F” 

such that eH T = s and |e| = w. 

This problem is NP-hard [BMvT78], even when the weight w is fixed to any proportion of n 
[Ste93a]. We discuss here the best solvers with a focus on non binary alphabets and weights close 
to n. In the sequel, we will denote SD(H, s, w) an instance of this problem. 

7.1 Solving Syndrome Decoding for High Weights 

In code-based cryptography, Problem 3 is usually considered for “small” values of w. The best 
known solvers derive from Algorithm 5 [Pra62], often referred to as Information Set Decoding 
(ISD). As it is described here, it runs repeatedly a polynomial time step, until it outputs e of 


Algorithm 5 Prange(H, s) 

Parameters ( q , n, k, w) 

Require: H G F'”"' c)Xn , s G ¥ 1 )~ k 

Ensure: eH T = s and |e| = w 
1: repeat 

2: 1 e- InfoSet(H) 

3: x {x € F” |xz| = t} > t = min max ^0, |^ui — 'C 1 (n — k) j ^ ^ 

4: e <— PrangeStep(H, s,X, x) 

5: until |e| = w 
6: return e 

The functions InfoSetQ and PrangeStepQ are defined in Algorithm 1. 


weight w. On input M G F“ xb , the call GaussElim(M) returns an a x b matrix M' starting with 
an a x a identity block and such that for some non singular matrix S, we have SM = M'. If M' 
doesn’t exist, the call fails. The | denotes matrix concatenation. 

Dumer’s variant of ISD [Dum91] is very similar to Stern’s variant [Ste88]. We will limit the 
description and analysis to instances of Problem 3 with q > 3 and w > k + ^-{n — k). On input 
M G F“ x6 , the call GaussElimPartial£(M) returns an a x b matrix M' whose first a — £ rows 
start with an (a — £) x (a — £) identity block, whose last t rows start with a l x (a — £) zero block, 
and such that for some non singular matrix S, we have SM = M'. If M' doesn’t exist, the call 
fails. The call CollisionSearch(H", s", £%, £ 2 ) returns the set £ = {e G £\ x S 2 I eH" T = s"}. 


7.2 Complexity Analysis 


We want to estimate the expected running time of Algorithm 5 and Algorithm 6 when its input 
(H, s) is drawn uniformly at random in 


~'q,n,k,w — 


{(H,s)GF<’ 


,—k)xr 


x F; 


i—k 


rank(H) = n — k, s G {eH T | e G F' 


9 ’ 


e = w 


>} 




28 


Thomas Debris-Alazard, Nicolas Sendrier, and Jean-Pierre Tillich 


Algorithm 6 Dumer(H, s) 

Parameters (q , n,k,w), q > 3, w > k + 2^T( n — k ) 

Additional parameters p, £, and L, tuned for each ( q , n, k, w) such that 0<p<n — w,0<£<w — k+p, 
and 1 < L < ( (fe p/ 2 /2 ) (q - l) (k+t ~ p)/ 2 

Require: H € F<" _ * )x, \ s G F™~ k 

Ensure: eH T = s and |e| = w 
1: loop 

2: I <— ExtInfoSet(H, t) 

3: S, H', H" <- Decomp(H,X) 

4: (s', s") <— sS T 

5: £ «— BirthdaySD(H", s", k + £ — p,I, L) 

6: for e' G £ do 

7: e g- PrangeStep(H', s',I, e') 

8: if |e| = w then 

9: return e 

The functions InfoSet(-) and PrangeStep(-) are defined in Algorithm 1. 
function ExtInfoSet(H, £) — extended information set 
Require: H G F) n fe - )xri , £ > 0 integer 

Ensure: returns T a set of k +1 coordinate indices containing an information set of (H) x 


function Decomp(H,I) — shorten and supplement 

Require: H G ]p4 ri-fe ' )xn , T contains an information set of (H) x and \T\ = k + £ 

Ensure: returns S, H', H" G w^~ k)x{n ~ k) xF^ n ~ k ~ e)Xn x¥ e q Xn such that S is non singular, Supp(H") Cl, 



The space (H") is a shortened code, it consists of all words of ( H) whose support is included ini. The space 
(H'} supplements (H") in (H). The fact that I is an information set of { H) x guaranties the existence of 




function BirthdaySD(H, s, w,T, L) — birthday syndrome decoding 

Require: H G Fq Xn , s G Fq, w an integer, 0 < w < \I\, Supp(H) C I a set of coordinate indices, L an 
integer, 1<L< ( |x| / 2 2 )( 9 - l) w ' 2 
Ensure: £ C {e G F™ \ |e| = w, eH T = s, Supp(e) C 1} 

1: Ti UT 2 <— T l> disjoint union, split I evenly 

2: for i = 1,2 do £i <— subset of L elements of {e G Fq \ |e| = w/2, Supp(e) C If} 

3: £ G {e G fi x £2 | eH T = s} 

4: return £ 

We do not detail instruction 3:. We will admit in the analysis that £ has cardinality L 2 q e on average 
and can be obtained for a cost max(|£i|, |£T 2 1, |£|) = ma x(L, L 2 q~ e ) up to a polynomial factor. 


This forces the problem to admit a solution and corresponds to the typical situation in cryptanal¬ 
ysis. 


Proposition 5 (Complexity of Prange’s Algorithm). We consider Algorithm 5 with param¬ 
eters ( q,n,k, w ) and an input (H, s) drawn uniformly at random in Iq t n,k,w Up to a polynomial 
factor, the algorithm has an expected running time equal to WFq (q,n,k,w) = 1 /Pq where 


Pn = 


Cd)Q-i r - 1 

mill (q n - k , (™){q — 1)‘ 


with, t = 


0 if w < (n — k) 

k if w > k + (n — k) , 

w — — k) else 


( 28 ) 


is the probability, up to a constant factor, that the instruction 5: produces a word of weight w. 
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Lemma 5. For any E C F”, we have, on average over all H £ k ^ xn 

1 < |{eH T |eg£}| 

2 — min (q n ~ k , \E\) ~ 

Proof (Proof of Proposition 5). We use the notation of Algorithm 5. The proof is similar to 
the proof of Proposition 2. The main difference is that we choose (H, s) in I q ^ n ,k,w instead of 
jpCi-fc)xn x Choosing H of full rank has a negligible impact on the probabilities. The choice 

of s has an impact as we choose it in the set {eH T | e £ F™, |e| = w} which might differ 
significantly from F^ -fc . From the above lemma, its size is equal to min (q n ~ k , ((f) (q — l) 1 ") up 
to a factor at most 2. In formula (If) of Proposition 2, we replace the denominator q n ~ k by 
min (q n ~ k ,(Z)(q- I)™)- Also, in Algorithm 5 the weight t has a fixed value depending on the 
parameters q,n,k,w and finally the probability to reach |e| = w is, from (fl), equal to 


P 0 = P(|e| = w) 


Ci9(g-ir-' 

min ( q n ~ k , (")(g - l) 1 "} 


up to a small constant. We will thus iterate on average I /Pq times an elementary step of polynomial 
cost, dominated by the linear algebra in PrangeStep(-). This concludes the proof. 

When t = w — n — k ) in (28) the probability Pq is proportional to 1/y/n — k in all other 
cases it is exponentially small in n. For a fixed code rate R = k/n and error rate ui = w/n , the 
following limit is defined and is called the asymptotic exponent 

c q (R, u>) = lim — log 2 WFo(g, n, Rn, am). 

n—>oo fl 


We give this exponent for R = 1/2 and q £ {2,3} in Figure 9. The exponent is null for middle 
values of uj, this means that the corresponding error weights can be obtained in polynomial time 
in n. We remark that for q = 2 the problem is symmetric with respect to the error weight w. In 
that case, finding a word of weight w with prescribed syndrome is exactly as hard as finding a 
word of weight n — w with prescribed syndrome. Indeed, for q = 2, solving the instance (H, s, w) of 
Problem 3 is the same thing as solving the instance (H, s+lH T ,n—w). For larger q , the symmetry 
disappears. The singular points correspond to the solutions of ((f) {q — l) 1 " = q n ~ k . On left-hand 
side it is the Gilbert-Varshamov distance. On the right hand side it exists only if 
For instance for q = 3, only if k/n < 1 — l/log 2 3 « 0.37. 


Fig. 9. Asymptotic exponent of Prange’s algorithm for code rate R £ {1/4,1/2} and q £ {2, 3} 



In the sequel, low weight will refer to the left-hand side of Figure 9, with w < CL -^ L (n — k ) which 
corresponds to the traditional syndrome decoding parameters while high weight will refer to the 
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right-hand side of Figure 9, with w > k + 2 -^ 1 (n — k), corresponding to the parameters of interest 
in this work. 


Proposition 6 (Complexity of Dumer Algorithm). We consider Algorithm 6 with parame¬ 
ters (q,n,k,w,p,£, L) and an input (H, s) drawn uniformly at random in T q n k w - Up to a poly¬ 
nomial factor, the algorithm has an expected running time equal to 


WF p n, k, w) = max 



q* \ 

Pp,eL 2 ) 


max 



(29) 


where 


min ( q n ~ k , (.”)(<?-!)"’) 


(30) 


is the probability, up to a constant factor, that the instruction 7: produces a word of weight w. 


Proof. Any e'ef has weight t = k +£ — p and it is completed by PrangeStep(-) with n — k — t 
random coordinates. We want to estimate the probability that this completion give a word of 
weight w. We apply Proposition 2, H' £ F,” k e ' )xn and s' £ ¥ q ~ k ~ £ with a distribution of |e'| 
which is concentrated on t = k + £ — p. The denominator is the number of possible input syndrome 
s divided by q l because s" € is fixed, that is q~ l min ( q n ~ k , (”) (q - 1)™). We obtain 


P p ,e 


min (q n ~ k , ©(<?- l) w ) 


with t = k — 


+ P- 


The instruction 7: will be executed 1 /P p / times on average. We have \£\ = L 2 q~ l on average, thus 
the main loop is executed l/(P p j\£\) times, rounded up. Instructions 2: to 5: will cost rriaxfL, |£|) 
up to a polynomial factor. In total, up to a polynomial factor, we thus have (29). 


In the sequel, we will denote 


WF Dumer (g, n, k, w) = min WF p e L (q,n,k,w) 
( P ,t,L)ev 


where V = j (p,£,L) \ 0 < p < n — w,0 < £ < w — k + p, 1 < L < {q — l)^ k+t P j is the set 
of admissible optimization parameters. 

Corollary 1. For any ( q,n,k,w), let £q denote the solution of Po,e 0 = q~ io ■ We have 


max 



q l ° \ 

(q- i)(fe+A)/2 J 


> WF Dumer (g, n, k, w) > ^ 


(31) 


up to a polynomial factor. 


Proof. (Sketch) We will minimize the formula (29) over V' = {(p, £, L) \ 0 < p < n — w, 0 < £ < 
w — k + p, L > 1}, that is we ignore the upper bound for L. Since V C V, this will give us a lower 
bound for the work factor (whose tightness is discussed later). 

When q > 2 and w > k + (n — k), and L is not upper bounded, an easy analysis shows the 
following, up to a polynomial factor. 

— It is best to choose 1 = p q T2 and L = — r . Thus parameters such that L = q = P~ e are 
optimal. 

— The function p > P p j is decreasing with p in the neighbourhood of the optimal parameters. 
It follows that p = 0 is optimal. 

— The equation q e Po,t = 1 has a single solution £ 0 £ [0, w — k]. 

It follows that {p,£,L) = (0,fo,f°) minimizes (29) over V ', and the right-hand side inequality of 
the statement holds. Let L 0 = min(q £ °,(q— l')( k +^o)/ 2 ^ we h ave (0,£ o ,Uo) € V. The evaluation of 
the work factor for this set of parameter provides the left-hand side upper bound. 
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Tightness of Corollary 1. Every time q e ° < (q — l)( fe +A>)/2 the bound (31) is tight. For high weights 
and non-binary alphabet, this happens for many system parameters of interest. It corresponds to 
situations where ( q — l)( fc +^>/ 2 j s l ar ge enough, and allows the choice of a list size L large enough 
to succeed in a constant number of iterations even with p = 0. 

In contrast, in the low weight case, the upper bound for L is ( fe + { ) (q — l) p and p = 0 correspond 
to an irrelevant degenerated case where L = 1. 

The bound of Corollary 1 is tight when 

— k/n > 1 — log 9 (g — 1) and all w 

— k/n < 1 — log g (g — 1) and w < k + (n — k)uj — (1 — A)w where A = 2 iogq-?og(g-i) an< ^ w is the 
solution 3 of H q (uj) = n ~Sl\+\)k ^ ar S er than 2^. Note that w exists only if k < 1 — log g (g— 1). 

For q = 3, the bound is tight when k/n > 0.369. In the current work we will never use smaller code 
rates. As for Prange’s algorithm, we may define the asymptotic exponent for a given R = k/n and 
uj = w/n as 

c q (R,uj)= lim - log 2 WF Bumel (q,n,Rn,um). 

n—> oo 77, 

Exponents are plotted for Prange and Dumer (bound) for q = 3, R £ {1/2,1/4}, and large values 

of UJ. 


Fig. 10. Asymptotic exponent of Dumer (dotted) vs. Prange for q = 3 
C 3 (R,UJ) 



Further Improvement The Algorithm 6 for high error weights and q > 2 has an interesting 
feature when the code rate is large enough ( k/n > 1 — log {q — 1), see Corollary 1). The optimal 
value for parameter p is zero and the optimal value of the list size L is below its upper bound 
(q— l)( fc +^)/ 2 . It corresponds to a situation where the algorithm requires a single or a few iterations, 
and increasing L would simply increase the cost of each iteration without the benefit of reducing 
the number of iterations. 

For low weights, Dumer’s algorithm performs an exponential number of loop iterations and the 
list size L meets its upper bounded ( k + e ) (q — l) p . The best improvements [MMT11, BJMM12] 
are using the representation technique, they essentially improve the CollisionSearch to allow 
a larger upper bound for L. Those algorithms perform a smaller, but still exponential, number 
of iterations, each having a larger list size and thus an improved “birthday effect”. This won’t 
happen in the high weight situation when k/n > 1 — log q (q — 1) (k/n > 0.369 for q = 3). The 
nearest neighbour approach [M015] might still allow some improvement but needs to be thoroughly 
revisited. 

3 H q (x) = — x\og q (x/(q — 1)) — (1 — x) log^l — x) is the g-ary entropy. 
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Previous Works In the binary case, we have seen earlier that the syndrome decoding problem 
is equally hard for low and high weights. For larger alphabets there is no obvious reduction one 
way or the other between low and high weights problems and the high weight syndrome decoding 
does not seem easier than its low weight counterpart. 

Except for the “Furthest neighbour” algorithm mentioned in [Ind03], we are not aware in the 
literature of any algorithm related to decoding far from the received word. 

Indeed, more research is needed to improve the understanding of high weight syndrome decod¬ 
ing. 

7.3 Application to the Proposed Signature 

In the current work, we will focus on the case q = 3 with high weight w > k + — k), and 

a large enough code rate k/n > 1 — log q (q — 1) = 0.369. This corresponds to a situation where 
Problem 3 has an exponentially large number of solutions and the best known solver is Dumer’s 
algorithm with p = 0 and a constant number of iterations. 

DOOM vs. Collisions 

Problem 4 (Syndrome Collision). Given H £ jp(n fc)xn an( j an integer w, find e,e' distinct in F” 
such that eH T = e'H T and |e| = |e'| = w. 

When q > 3 the above problem is not harder than finding a non-zero word x £ F” of weight < 2 w 
such that xH T = 0. Indeed, given x 0 it is a simple matter to find e, e' such that e e' = x 
and e = |e 7 1 = w for any w > |x|/2. It follows that for high weights and a non binary alphabet 
the Syndrome Collision problem is always easy. 

The key primitive in our design is the syndrome e i —> eH T with |e| = w. In the GPV setting 
[GPV08], one of the feature needed for designing a secure signature scheme is collision-freeness. 
From the above remark, this cannot be achieved for syndromes with high w. Instead, our reduction 
will require the primitive to be resistant to multi-target preimage, namely the Decoding One Out 
of Many (DOOM) problem. 

Problem 2. (DOOM - Decoding One Out of Many). Given H £ Fq" fe1x ”, si, • • • ,sn £ F^ _fc , and 
an integer w, find e € F^ and i,l < i < N such that such that eH T = s.j and |e| = w. 

In DOOM, the adversary considers an arbitrary large number of instances of syndrome decoding 
in which only syndrome s varies, and needs to solve only one of them. 

What the DOOM variant of ISD [Senll] does to deal with N instances consists essentially in 
(i) multiplying the cost of the iteration, in fact the list size by \/N, and (ii) dividing the number 
of iteration by N. The number iterations cannot be lower than 1, and this limits in practice the 
number of instances that can be efficiently treated simultaneously. In our case, Dumer’s algorithm 
is already reduced to a single iteration and no improvement is possible with this method. With 
the current state-of-the-art the DOOM problem for high weights does not appear easier to solve 
than the Syndrome Decoding problem. 

8 Distinguishing a Permuted Admissible Generalized ( U , U + V ) Code 

A permissible generalized ( U , U + V) code where U and V are random seems very close to a 
random linear code. We assume in the whole section that such a code is defined from a 4-tuple of 
matrices that we denote by (Di, D 2 , D 3 , D 4 ). There is for instance only a very slight difference 
between the weight distribution of a random linear code and the weight distribution of a random 
admissible generalized (U, U + P)-code of the same length and dimension. This slight difference 
happens for small and large weights and is due to codewords where v = 0 or u = 0 which are of 
the form (uDj, 11 D 3 ) where u belongs to U or codewords of the form (vD 2 , VD 4 ) where v belongs 
to V. This weight distribution will depend on the matrices D,’s. The definition of number of V 
blocks (see Definition 3 in §5.1) is helpful for describing the codewords of the form (VD 2 , VD 4 ). 
With this definition at hand, we have the following proposition 
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Proposition 7. Assume that we choose an admissible generalized (U, U + V) code over F 3 with 
a number ni of linear combinations of type I by picking the parity-check matrices of U and V 
uniformly at random among the ternary matrices of size (n /2 — kjj) x n/2 and (n /2 — ky) x n /2 
respectively. Let d( UlV )(w), a( Uj o)(tf) and a( o, v )(w) be the expected number of codewords of weight 
w that are respectively in the admissible generalized (U, U+V) code, of the form (uD 4 , uD 3 ) where 
u belongs to U and of the form (vD 2 ,vD 4 ) where v belongs to V. These numbers are given for 
even w in {0 ,..., n} by 

1 y' f ( n / 2 “ ni \ 2 (w+j )/ 2 

3 n/2- kv Z, \ j V ) 

j even 


.,0 )H = 




w/2 


3 n/2-k v 


a(o , v )(w) = 


a (u,v)(w) 


«(u,0)M + O(0,v)(^) + 


1 


Qn—kjj—kv 



and for odd w € { 0 , • • • , n} by 


C'fr-' 2 



n/2 — ni 
w-j 
2 


j even 


\ 

2 Iw+j)/ 2 

J 


a( u ,o)(w) = 0 ; a (0 , v )M = 


1 

Qn/2—ky 


E 

3 =0 

j odd 


n/2 - ni \ 2 (™+3)/* 


w-j 

2 


a (u,v)(w) 


«(0,v)O) 


1 


Qn—ku—ky 




j odd 


n/2 — ni 


w-j 

2 


\ 

2 (w+j)/2 


On the other hand, when we choose a linear code of length n over F 3 with a random parity-check 
matrix of size (n — kjj — ky) x n chosen uniformly at random, then the expected number a(w) of 
codewords of weight w > 0 is given by 


a(w) 


CP W 

3 n—ku—k v ' 


The proof of this proposition is in Appendix §C 


Remark 6. When the generalized (U, U + V) code is chosen in this way, its dimension is ku + ky 
with probability 1 — 0 (max(3 fc[J_r! / 2 , 3 fc v-"/ 2 )) This also holds for the random codes of length 
n. 


We have plotted in Figure 11 the normalized logarithm of the density of codewords of the form 
(uDi, 11D3) and (vD 2 vD 4 ) of relative even weight x = ^ against x in the case where U is of rate 
= 0.7, V is of rate = 0.3 and ffL = These two relative densities are defined respectively 
by 

, , , A log 2 (fl( u o)(®)/fl (u v) (®)) , , a log 2 (a ( o,v)W/a(u,v)W) 

a u {w/n) = - ; a v (w/n) = - 

n n 

We see that for a relative weight w/n below approximately 0.26 almost all the codewords are of 
the form (uD 4 ,uD 3 ) in this case. 

Since the weight distribution is invariant by permuting the positions, this slight difference also 
survives in the permuted version of the admissible generalized (U, U+V ) code. These considerations 
lead to the best attack we have found for recovering the structure of a permuted admissible 
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Fig. 11. a u (w/n) and a v (w/n) against x=—. 


generalized ( U , U + V) code. It consists in applying known algorithms aiming at recovering low 
weight codewords in a linear code. We run such an algorithm until getting at some point either 
a permuted (uD 3 ,uD 3 ) codeword where u is in U or a permuted (vD 2 ,vD 4 ) codeword where v 
belongs to V. The rationale behind this algorithm is that the density of codewords of the form 
(uDi,uD 3 ) or (vD 2 , vDj) is bigger when the weight of the codeword gets smaller. 

Once we have such a codeword we can bootstrap from there very similarly to what has been 
done in [OT11, Subs. 4.4]. Note that this attack is actually very close in spirit to the attack that 
was devised on the KKS signature scheme [OT11]. In essence, the attack against the KKS scheme 
really amounts to recover the support of the V code. The difference with the KKS scheme is that 
the support of V is much bigger in our case. As explained in the conclusion of [OT11] the attack 
against the KKS scheme has in essence an exponential complexity. This exponent becomes really 
prohibitive in our case when the parameters of U and V are chosen appropriately as we will now 
explain. 


8.1 Recovering the U Code up to Permutation 

We consider here the permuted code 

U' ={UT> U f7D 3 )P = {(uDj, uD 3 )P : u G U}. 

The attack in this case consists in recovering a basis of U'. Once this is done, it is easy to recover 
the U code up to permutation by matching the pairs of coordinates which are either always equal 
or always sum to 0 in U'. The basic algorithm for recovering the code U' is given in Algorithm 7. 
It uses other auxiliary functions 

— CODEWORDS(Punc/(C p k),p) which computes all (or a big fraction of) codewords of weight p of 
the punctured public code Punc/(C p k). All modern [Dum91, FS09, MMT11, BJMM12, M015] 
algorithms for decoding linear codes perform such a task in their inner loop. 

— Complete(x, /,Cpk) which computes the codeword c in C p k such that its restriction outside 
I is equal to x. 

— CheckU(x) which checks whether x belongs to U'. 


Choosing N Appropriately. Let us first analyse how we have to choose N such that COMPU- 
teU returns 12(1) elements. This is essentially the analysis which can be found in [OT11, Subsec 
5.2], This analysis leads to 
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Algorithm 7 ComputeU: algorithm that computes a set of independent elements in U'. 
Parameters: (i) l : small integer (typically l ^ 40), 

(ii) p : very small integer (typically 1 ^ p ^ 10). 

Input: (i) C p k the public code used for verifying signatures. 

(ii) N a certain number of iterations 
Output: an independent set of elements in U' 

1: function ComputeU(C p k, N) 

2: for i = 1,..., N do 

3: B <- 0 

4: Choose a set I C {1,..., n} of size n — k — i uniformly at random 

5: C <- CODEWORDS(Punc/(C p k),p) 

6: for all x £ C do 

7: x Complete(x, 7,C p k) 

8: if CheckU(x) then 

9: add x to B if x B > 

10: return B 


Proposition 8. The probability P SU cc that one iteration of the for loop (Instruction 2) in Com¬ 
puteU adds elements to the list B is lower-bounded by 


P s , 


n/2 (n/2\( n/2-w \nk+l 
> ^ \ w ) \k+l-2w) 


— 2 w 


(k-\-l—2w 


w—0 


U.) 


^max(0 ,k-\-£—w—kjj) 


(32) 


where f is the function defined by f(x) = max (x(l — x/2), 1 — ^). Algorithm 7 returns a non zero 
list with probability 12(1) when N is chosen as N = 12 (^p^ - ) • 

Proof. It will be helpful to recall [OT11, Lemma 3] 


Lemma 6. Choose a random code C ran d of length n from a parity-check matrix of size rxn chosen 
uniformly at random in Fg X ". Let X be some subset of Fg of size m. We have 

p (xnc TO „ d ^0)>/(p). 

We say that two positions i and j are matched (for U') if and only if there exists A € {±1} 
such that Cj = A Cj for every c € U'. From the fact that we only consider admissible generalized 
(U, U + U)-codes, there are clearly n/2 pairs of matched positions. W will now be defined by the 
number of matched pairs that are included in {1 ,,n} \ I where I is the random set of size 
n — k — l which is drawn in Instruction 4 of Algorithm 7. We compute the probability of success 
by conditioning on the values taken by W: 


n/2 

-Psucc = X! = (3x e U' : |xjI =p\W = w) (33) 

W—0 


where 7={1, ... ,n}\I. Notice that we can partition 7 as 7 = J 1 UJ 2 where J 2 consists in the union 
of the matched pairs in 7. Note that \J%\ = 2 w. We may further partition J 2 as J 2 = J 2 1 U J 22 
where the elements of a matched pair are divided into the two sets. In other words, neither J 21 
nor J 22 contains a matched pair. We are going to consider the codes 

U" = Pune (U') 

U'" = Punc({7 / ) 

IUJ22 
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The last code is of length n — {n — k — t + w) = k + i —w as | J 22 I = w and |/| = n — k — l. The 
point of defining the first code is that 

P (3x £ U' : |x/| = p | W = w) 


is equal to the probability that U” contains a codeword of weight p. The problem is that we 
can not apply Lemma 6 to it due to the matched positions it contains (the code is not random). 
This is precisely the point of defining U '". In this case, we can consider that it is a random 
code whose parity-check matrix is chosen uniformly at random among the set of matrices of size 
max(0, k + l — w — kjj) x {k + i — w). We can therefore apply Lemma 6 to it. We have to be 
careful about the words of weight p in U” though, since they do not have the same probability 
of occurring in U” due to the possible presence of matched pairs in the support. This is why we 
introduce for i in {0,..., |_p/2_|} the sets X,; defined as follows 

X, ={x = (Xi) iel \j 22 £ : | Xjl | = p - 2i, |x Jal | = *} 

A codeword of weight p in U" corresponds to some word in one of the Xj’s by puncturing it in 
J 22 - We obviously have the lower bound 


P {3x £ U' : |xj| = p \ W 


i =0 


1. \ " 


By using Lemma 6 we have 


P(x, ; n u"’ + 0) > / 


l ^max(0 ,k-\-£—w—ku) 


(34) 


(35) 


On the other hand, we may notice that 


P(W = w) = 


(n/2\ ( n/2—w \nk+l-2w 
V w l\k+l- 2roP _ 

«,) 


These considerations lead to the following lower bound on P SUC c 

( fc f 2 r)(T) 2P 


p m 


n/2 (n/2\(n/2-w\r ) k+l 
> ^ \ w ) \k+l-2w) 


— 2w 


w =0 


(*+,) 


-/ 


^max(0 ,k-\-£—w—kjj) 


Complexity of Recovering a Permuted Version of U. The complexity of a call to COMPU- 
teU can be estimated as follows. We denote the complexity of computing the list of codewords 
of weight p in a code of length k + £ and dimension k by Ci(p, k,£). It depends on the particu¬ 
lar algorithm used here. For more details see [Dum91, FS09, MMT11, BJMM12, M015]. This is 
the complexity of the call CODEWORDS(Punc/(C p k),p) in Step 5 in Algorithm 7. The complexity 
of ComputeU and hence the complexity of recovering a permuted version of U is clearly lower 
bounded by Q ( j. it turns out that the whole complexity of recovering a permuted version 

of U is actually of this order, namely 0 ( ). This can be done by a combination of two 

techniques 

— Once a non-zero element of U’ has been identified, it is much easier to find other ones. This 
uses one of the tricks for breaking the KKS scheme (see [OT11, Subs. 4.4]). The point is 
the following: if we start again the procedure ComputeU, but this time by choosing a set I 
on which we puncture the code which contains the support of the codeword that we already 
found, then the number N of iterations that we have to perform until finding a new element 
is negligible when compared to the original value of N. 
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— The call to CheckU can be implemented in such a way that the additional complexity coming 
from all the calls to this function is of the same order as the N calls to Codewords. The 
strategy to adopt depends on the values of the dimensions k and k\j. In certain cases, it is 
easy to detect such codewords since they have a typical weight that is significantly smaller 
than the other codewords. In more complicated cases, we might have to combine a technique 
checking first the weight of x, if it is above some prescribed threshold, we decide that it is not 
in U', if it is below the threshold, we decide that it is a suspicious candidate and use then the 
previous trick. We namely check whether the support of the codeword x can be used to find 
other suspicious candidates much more quickly than performing N calls to CheckU. 

To keep the length of this paper within some reasonable limit we avoid here giving the analysis of 
those steps and we will just use the aforementioned lower bound on the complexity of recovering 
a permuted version of U. 

8.2 Recovering the V Code up to a Permutation 

We consider here the permuted code 

V'=(PD 2 , VD 4 )P = {(vD 2 ,vD 4 )P where v e V}. 

The attack in this case consists in recovering a basis of V'. Once this is achieved, the support 
Supp(V') of V' can easily be obtained. Recall that this is the set of positions for which there exists 
at least one codeword of V' that is non-zero in this position. This allows to easily recover the 
code V up to some permutation. The algorithm for recovering V' is the same as the algorithm for 
recovering U'. We call the associated function ComputeV though since they differ in the choice 
for N. The analysis is slightly different indeed. 


Choosing N Appropriately. As in the previous subsection let us analyse how we have to choose 
N in order that ComputeV returns 12(1) elements of V'. We have in this case the following result. 

Proposition 9. The probability P succ that one iteration of the for loop (Instruction 2) in Com¬ 
puteV adds elements to the list B is lower-bounded by 


min(n—k—l,n—ni)n/2—ni (% — 


P > 

succ _ 


E E 


^T)(n-k-l -J LE/2J f ( ( 


w —0 


m—0 


n/2—ni—r 

E 


( (n—ni—w—2m\ (m\ op—i \ 

\ p—2i ) \ i ) \ 

-E-7—V 

^max(U,n-n/ —w—m—kY) J 


3=0 


n/2 — ni — m 

j 




ni 


w — n + 2 ni + 2m + j 


where f is the function defined by /( x) = max (x(l — x/2), 1 — j). ComputeV returns a non-zero 
list with probability f2( 1) when N is chosen as N = fl ( -J— ). 

y t^SUCC J 

Proof. To lower-bound the probability P SUC c that an iteration is successful, let us first introduce 
the concept of matched positions (for V'). We say that two positions i and j are matched if and 
only if there exists A £ {±1} such that c, = A Cj for every c £ V'. There are clearly § — n/ pairs of 
matched positions. Let us define the following set: J is the set of positions that are of the images of 
the permutation P of the positions 1 < i < n/2 such that D 2 (i, i) ^ 0 and the images of positions 
n/2 + j with 0 < j < n/2 such that D 4 (j, j) ^ 0. 

Remark 7. From Definition 3 it follows that 


| J| = n — ni 


(see Remark 3 in §5.1). 
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Let us now bring in the following random variables 

I'tin J and Wt\l'\ 

and M be the number of matched pairs which are included in J \ I'. J \ I' represents the set of 
positions that are not necessarily equal to 0 in the punctured code Puncj(W) (see Figure 12). 


Fig. 12. A figure representing J, I and /' and the form of a codeword in V'. 


positions in 




I 


ComputeV outputs at least one element of V' if there is an element of weight p in Punc/^V 7 ). 
Therefore the probability of success P succ is given by 

min (n—k—l,n—ni) n/2—ni 

-Psucc = ^ ^ IP (3x € V' : |x,//1 = p | W = w, M = m) P {W =w,M = to) (36) 

w—0 m —0 

where 

j'tj\r . 

Notice that we can partition J' as J' = J\U J 2 where consists in the union of the matched pairs 
in J'. Note that | J2I = 2m. We may further partition J2 as J 2 = J 21 U J 22 where the elements of a 
matched pair are divided in two sets. In other words, neither J21 nor J 22 contains a matched pair. 
We are going to consider the following codes 

V” = Punc(W) 

IUJ 

V'" = Pune (W). 

/UJUJ22 

V” is of length n — ni — w, whereas the last code is of length n — n/ — w — to. The point of defining 
the first code is that 

P (3x £ V' : |xj/1 = p | W = w) 

is equal to the probability that V” contains a codeword of weight p. The problem is that we can not 
apply Lemma 6 to it due to the matched positions it contains. This is precisely the point of defining 
V'". In this case, we can consider that it is a random code whose parity-check matrix is chosen 
uniformly at random among the set of matrices of size max(0, n — ni — w — m — kv ) x (ny — w — to). 
We can therefore apply Lemma 6 to it. We have to be careful about the words of weight p in V” 
though, since they do not have the same probability of occurring in V” due to the possible presence 
of matched pairs in the support. This is why we introduce for i in {0,..., \p/ 2J} the sets Xi defined 
as follows 

Xi ={x = {Xi) ie jf\j 22 £ . | Xji \=p-2i, |Xj 21 | = i} 

A codeword of weight p in V” corresponds to some word in one of the Xj’s by puncturing it in 
J 2 2- We obviously have the lower bound 

P {3x £ V' ■ |xj | = p\ W = w,M = m}> max {P(W n V"' ^ 0)} 

i—0 


(37) 
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By using Lemma 6 we have 


/ /n—nj—w—2m\ / m\ Qp- 


^max(0 ,n—ni — w—m—kv) 


(38) 


On the other hand, we have 

/f-n/W n; \ n/2—ni—m 
V m ) \n—k—l—w) 


P (W = w,M = to) = 


E 

J=0 


n /2 — nj — m 

j 


L-'l-,) 

These considerations lead to the following lower bound on P E 




ni 

w — n + 2 nj + 2 to + j 


min(n—k — l,n—n I )n/2—n I ( Vt — n,\ ( n—n—n, \ / (n— m — w— 2m\ (m\ r,p— 

E E ^ f I p_2? 


ty=0 


m=0 


/ \n — K — l — W/ lx- / -j «i \ y—AL / \ L / 

- 7 —-—r-max t - 777 - 

/ n \ £_q ^ l 9max(U,n-n/-w-m- 

\n—k—lJ \ 

n/2—nj—m 

E 


fev) 


J=0 


n /2 — nj — m 

j 


2 J 


n/ 


■u; — n + 2 n/ + 2 ?n + j 


The claim on the number N of iterations follows directly from this. 


Complexity of Recovering a Permuted Version of V . As for recovering the permuted U 
code, the complexity for recovering the permuted V is of order fi ^ C'Ap.M) 


8.3 Distinguishing a Generalized (U,U + V) Code 


It is not clear in the second case that from the single knowledge of V' and a permuted version of 
V we are able to find a permutation of the positions which gives to the whole code the structure 
of a generalized (U, U + V)-code. However in both cases as single successful call to ComputeV 
(resp. ComputeU) is really distinguishing the code from a random code of the same length and 
dimension. In other words, we have a distinguishing attack whose complexity is given by the 
following proposition 


Proposition 10. The aforementioned algorithms lead to a distinguishing attack whose complexity 
is given by min (O (min Pji C v (p, l )), O (min P) ; C v (p, l))) 


Cu(p,l) = 


W=0 (k+l) *=0 




;>max(0,fc + £- 


~ k U ) 


(39) 


CV(P,0 = 


CiCp,m) 


Ex ma p 

\n — k — l) 1=0 


2 p~ 1 


?) 

^maxfO ,n — nj —w — m — ky) 


fn/2—ni—m\c\jf nj \ 

V j ) \w-n-\-2ni-\-2m-\-j) 


(40) 


where C\(p,k,£) is the the complexity of a computing a constant fraction (say half of them) 
of the codewords of weight p in a code of length k + £ and dimension k and f is the func¬ 
tion f(x) == max (x(l — x/2), 1 — ^). The sum in the denominator of (40) is over the domain 
I = {(«/, m,j) | 0 < w < min(?r — k — £,n — nj ), 0 < to < n/2 — nj, 0 < j < n/2 — nj — to}. 
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9 Parameter Selection 

In the light of the security proof in §6 and the reject sampling method in §5, we need to derive 
parameters which lead to negligible success for the two following problems: 

1. Solve a syndrome decoding problem with multiple instances (DOOM) for parameters n,k,w 
and an arbitrarily large number of instances. 

2. Distinguish public matrices of the generalized admissible (U, U + V) code family from random 
matrices of same size. 

To specify the signature scheme, we need to choose the salt size Ao- From the security proof, it 
is sufficient to have Ao = log 2 ((feign) where g s i gn is the number of signature queries allowed to the 
adversary. Since g s i gn < 2 A (A the security parameter) we choose a conservative Ao = A. We gave 
in §7 and §8 state-of-the-art algorithms for the two problems mentioned above. This served as a 
basis for the parameters proposed in Table 1. 

For any set of parameters (n, k. w, kjj, ky) the message security is based on the cost of Dumer’s 
algorithm, as mentioned in §7, in the range of parameters we consider here, this cost cannot be 
improved as it is done in the binary case. Moreover, considering multiple target (DOOM) does 
not seem to give an advantage to the attacker. For the range of parameters we have explored, the 
key security seems to depend solely of the attacks on U, that is (39). 

The essential point for parameter selection is the number a introduced at the end of §5. It is 
a number between 0 and 1 which affects the distribution of the signature weight. We need a < 1 
to allow an efficient rejection sampling. Large values of a favour the message security while small 
values favour key security. For each value of a and k/n we derive from equations (23), (24), and 
(25), the values of w/n, ku/n , and ky /n and thus the security estimates from §7 and § 8 . 

The optimal pair in this respect is ( a , k/n ) = (0.545,0.7555) for which, in the current state-of- 
the-art, the best attack has a cost 2 cn with c = 0.02464. The choice is made such that key attacks 
and message attacks have the same cost. For 128 bits of security against a classical adversary 
we obtain the numbers of Table 1. Recall that we are using the ternary alphabet F 3 . Those 
parameters scale linearly, except for the key size which grows as the square of the security. We did 
not investigate specific quantum attacks, but since known attacks are based on decoding problems, 
it should be more than enough to increase the classical exponent by a factor two. This leads to 
quantum safe parameters that are the double of those given in Table 1, except for the key size 
which would be slightly below 4 megabytes. 


Table 1 . Proposed Parameters for the Wave Signature Scheme and 128 bits of (classical) security 


(n, k, w) 

(5172,3908,4980) 

( ku,ky ) 

(2299,1609) 

Signature length (bits) 

8326 

Public key size (MBytes) 

0.98 


Rejection Sampling Cost. Each of the two steps of our decoding algorithm takes as parameter 
a weight distribution, respectively Vy and T>u . For the parameters of Table 1, the first decoding 
step with T>y a normalized Laplace distribution of mean (1 — a)ky and variance 18.81 yields a 
rejection rate of 11%. For the second decoding step, we also choose a Laplace distribution. The 
mean and variance of that distribution is optimized according to the first step output weight. In 
the average case the rejection rate is 19%. That is one rejection every 3 or 4 signatures. 

10 Concluding Remarks and Further Work 

We have presented Wave the first code-based “hash-and-sign” signature scheme which strictly 
follows the GPV strategy [GPV08]. This strategy provides a very high level of security, but because 
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of the multiple constraints it imposes, very few schemes managed to comply to it. For instance, 
only one such scheme based on hard lattice problems [FHK+] was proposed to the recent NIST 
standardization effort. Our scheme is secure under two assumptions from coding theory. Both of 
those assumptions relate closely to hard decoding problems. Using rejection sampling, we have 
shown how to efficiently avoid key leakage from any number of signatures. The main purpose of 
our work was to propose this new scheme and assess its security. Still, it has a few issues and 
extensions that are of interest. 

Decoding Problems. The message security of Wave relates to the hardness of finding a codeword 
far from a given word. We derived a solver from existing decoding techniques, namely ISD [Pra62, 
Dum91], but the evolutions of ISD [MMT11, BJMM12] that successfully improved decoding in 
the close codeword setting, fail for high weights. Similarly, multiple target decoders [Senll] are 
ineffective here. We believe the problem is exponential by nature, but further studies certainly need 
to be conducted to understand if, and how much, the exponent can be lowered by new techniques. 

Distinguishability. Deciding whether a matrix is a parity check matrix of a generalized ( U, U + V) 
code is also a new problem. As shown in [DST17b] it is hard in the worst case since the problem 
is NP-complete. In the binary case, (U, U + V) codes have a large hull dimension for some set 
of parameters which are precisely those used in [DST 17b]. In the ternary case the admissible 
generalized (U,U + V) codes do not suffer from this flaw. The freedom of the choice on the 
diagonal matrices D; is very likely to make the distinguishing problem much harder for generalized 
(U, U + V) codes than for plain (U, U + U)-codes. Coming up with non-metric based distinguishers 
in the generalized case seems a tantalizing problem here. 

Rejection Sampling. Rejection sampling in our algorithm is relatively unobtrusive: a rejection every 
few signatures with a crude tuning of the decoder. We believe that it can be further improved. 
Our decoding has two steps. Each step is parametrized by a weight distribution which conditions 
the output weight distribution. We believe that we can tune those distributions to reduce the 
probability of rejection to an arbitrarily small value. This task requires a better understanding of 
the distributions involved. This could offer an interesting trade-off in which the designer/signer 
would have to precompute and store a set of distributions but in exchange would produce a signing 
algorithm that emulates a uniform distribution without rejection sampling. 
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A Proofs for §6 

A.l List Emulation 

In the security proof, we need to build lists of indices (salts) in F 3 0 . Those lists have size q s i gn , the 
maximum number of signature queries allowed to the adversary, a number which is possibly very 
large. For each message m which is either hashed or signed in the game we need to be able to 

— create a list L m of q s ign random elements of F 3 0 , when calling the constructor new list(); 

— pick an element in L m , using the method L m .next(), this element can be picked only once; 

— decide whether or not a given salt r is in L m , when calling L m .contains(r). 

The straightforward manner to achieve this is to draw q s i gn random numbers when the list is 
constructed, this has to be done once for each different message m used in the game. This may 
result in a quadratic cost Qhash<Zsign just to build the lists. Once the lists are constructed, and 
assuming they are stored in a proper data structure (a heap for instance) picking an element or 
testing membership has a cost at most 0 (log g s j gn ), that is at most linear in the security parameter 
A. 


class list 

method list.contains(r) 

return r £ {elt[i], 1 < * < Qsign} 

elt, index 
list () 

index <— 0 

for i = 1, . . . , Qsign 

elt [i] 4 — randint(2 A °) 

method list.next() 

index <— index + 1 
return elt [index] 


Fig. 13. Standard implementation of the list operations. 


Note that in our game we condition on the event that all elements of L m are different. This 
implies that now L m is obtained by choosing among the subsets of size q s ; gn of F 3 0 uniformly at 
random. We wish to emulate the list operations and never construct them explicitly such that 
the probabilistic model for L m .next() and T m .contains(r) stays the same as above (but again 
conditioned on the event that all elements of L m are different). For this purpose, we want to ensure 
that at any time we call either T m .contains(r) or L m .next() we have 

P(T m .contains(r) = true) = P(r £ L m \Q) (41) 

P(r = L m .next()) = p(r|Q) (42) 

for every r £ F 3 0 . Here Q represents the queries to r made so far and whether or not these r’s 
belong to L m . Queries to r can be made through two different calls. The first one is a call of 
the form Sign(m) when it chooses r during the random assignment r {0,1} A °. This results 
in a call to Hash(m, r) which queries itself whether r belongs to L m or not through the call 
L m .contains(r). The answer is necessarily positive in this case. The second way to query r is 
by calling Hash(m, r) directly. In this case, both answers true and false are possible. p(r|Q) 
represents the probability distribution of L m .next() that we have in the above implementation of 
the list operations given the previous queries Q. 

A convenient way to represent Q is through three lists S, H tr ue and iFfaise- S is the list of r’s 
that have been queried through a call Sign(m). They belong necessarily to L m . H true is the set 
of r’s that have not been queried so far through a call to Sign(m) but have been queried through 
a direct call Hash(m,r) and for which T m .contains(r) returned true. Hf a \ se is the list of r’s that 
have been queried by a call of the form Hash(m, r) and T m .contains(r) returned false. 
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We clearly have 


P(r G L m \Q) = 0 if r G H false 
P(r G L m \Q) = 1 if r G S U H true 

e.se. 

To compute the probability distribution p{r\Q) it is helpful to notice that 

I tr ue I 


P(L m .next() outputs an element of H tlue ) = 


*7sign | S | 


This can be used to derive p(r| Q) as follows 

p{ r| Q) = 0 if r G H ialse U S 
p(r\Q) = 


^sign 


-5 


p(r\Q) = 


if r G i?true 

Qsign |*^| |-^true| 


(<7sign - <5)(2 A ° - |J? true | - |5| - iTffalsel) 


else. 


(43) 

(44) 

(45) 


(46) 


(47) 

(48) 

(49) 


(47) is obvious. (48) follows from that all elements of H tlue have the same probability to be 
chosen as return value for A m .next() and (46). (49) follows by a similar reasoning by arguing 
(i) that all the elements of F 3 0 \ (-S' U H tIU e U 77f a ise) have the same probability to be chosen 
as return value for L m .next(), (ii) the probability that T m .next() outputs an element of F 3 0 \ 
(S U H tIU e U -fffaise) is the probability that it does not output an element of H tlue which is 1 — 
l^truel _ gglgnHSHgtruil 

*?sign | | * 7 sign | -S | 

Figure 14 explains how we perform the emulation of the list operations so that they perform 
similarly to genuine list operations as specified above. The idea is to create and to operate explicitly 
on the lists S, H true and H{ a \ se described earlier. We have chosen there 


q _ 1/sign l-^truel l-S) , _ |i7true| 

P ~ 2 A 0 - |i7 truc | - \S\ - liaise | “ 7 ~ <?sig„ - \S\ ‘ 

we also assume that when we call randomPopO on a list it outputs an element of the list uniformly 
at random and removes this element from it. The method push adds an element in a list. The 
procedure rand() picks a real number between 0 and 1 uniformly at random. 


class list 

method list.contains(r) 

method list.nextQ 

Htrue 5 -hfalse 5 S 

if r 0 Htrue U Tlfalse U S 

if rand() < 7 

list () 

if rand() < /3 

r <S— Tftrue-i'andomPopQ 

Htrue 4— 0 

-fftruc-push(r) 

else 

-f^false ^ 0 

else 

r F 3 0 \ (Htrue USU HfuUe) 

5^—0 

77faise-push(r) 

S. push(r) 


return r G H t rue U S 

return r 


Fig. 14. Emulation of the list operations. 


The correctness of this emulation follows directly from the calculations given above. For in¬ 
stance the correctness of the call L m .next() follows from the fact that with probability = 7 

it outputs an element of H tr ue chosen uniformly at random (see (46)). In such a case the corre¬ 
sponding element has to be moved from H tlue to S (since it has been queried now through a 
call to Sign(m)). The correctness of L m .contains(r) is a direct consequence of the formulas 
for P(r G L m \Q) given in (43), (44) and (45). All push, pop, membership testing above can be 
implemented in time proportional to Aq. 
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A.2 Proof of Lemma 3 


The goal of this subsection is to estimate the probability of a collision in a signature query for a 
message m when we allow at most g s i gn queries (the event F in the security proof) and to deduce 
Lemma 3 of §6.3. We recall that in 5 C ode for each signature query, we pick r uniformly at random 
in {0,1} A °. Then the probability we are looking for is bounded by the probability to pick the same 
r at least twice after q s i gn draws. The following lemma will be useful. 

Lemma 7. The pi'obability to have at least one collision after drawing uniformly and indepen¬ 
dently t elements in a set of size n is upper bounded by t 2 /n for sufficiently large n and t 2 < n. 


Proof. The probability of no collisions after drawing independently t elements among n is: 


Pn,t= n 

i =0 



t-1 


2—0 



2 n 


from which we easily get 1 — < t 2 /n, concluding the proof. 


In our case, the probability of the event F is bounded by the previous probability for t = q s i gn 
and n = 2 A °, so, with A 0 = A + 21og 2 g sign , we can conclude that 


P (F) < 


flsign 

2 A ° 


1 

2A 0 —21og 2 (g si gn) 


1 


which concludes the proof of Lemma 3. 


B Proof of Proposition 3 

Our goal in this subsection is to prove Proposition 3 of §5.1. 

Probabilistic notation. Recall that we denote by U w the uniform distribution over S w . 

Vector notation. If e G F g and i G [l,n], we will denote by e(i) the i-th component of e. 
Furthermore, if we split e as (ei,e 2 ) where ei (resp. e 2 ) is the vector formed by its first (resp. 
last) n/2 coordinates. We define eW for all i G [l,n/2] as: 


A 

e i (i) 


e 2 {i) 


We start by computing here a distribution that will be useful in the following. 


Proposition 11. Let e = (ei,e 2 ) be random a random variable whose distribution is U w where 
the e.j ’s are vectors of F^ 2 . We have for all j G [0, n/2]; 


Pe(|ei - e 2 | = j) = P e (|ei + e 2 | = j ) 



w-\-p—0 mod 2 



2 


zv+3p 

2 


Lemma 8. Let e = (ei,e 2 ) be a word of S w . The weight |ei + e 2 | is given by the number of 
i G [l,n/2] such that is one of the following element: 


-1 


0 


1 


0 


-1 


1 

0 


-1 


0 

1 

1 

? 

-1 

? 

I 
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Proof (Proposition 11). Let us enumerate the number of errors e = (ei,e 2 ) G S w which verify 
|ei + e 2 | = j. For this let introduce the following intermediary weights: 


lei 1 = 


i G [1, n/2] : e (i) G 


-1 


0 


e = 


|e| 3 = 


i G [1, n/2] : 


QD ’ 

-1 

e« G J 

r 

-1 


1 

1 — 1 

3 (i) G { 

1 


-1 


-1 


It is easily verified that, 


|e| = |e| 1 + 2|e| 2 + 2 1 e | 3 and |er + e 2 | = |e| x + |e| : 

In this way, 

j 

P e (|ei +e 2 | = j) = ^P e ( |e| 2 =j-p | = p) P e ((e) 1 = p) 

p —0 

= ^p e (l e l 2 =i-p ) |e | 1 =p) 

p—0 


1 -k 

o w ( n \ ^ 
\w) p —0 

3 


n / 2 \ A p( n / 2 ~P 

j -P 


2 J " 


n/2-p- (j - p)\ ™-P- 2 u- P ) 

w-p-2(j-p) I ^ 


1 


e ( n f) ( n f-7 


2 w ( n ) 

\W/ X)—' 

^G)§( P A i-p J\T-i 


/2\ (n/2 -p\fn/2-j\ 2 ^ 


It remains now the following computation to conclude: 
r n/2\ (n/2 - p\ ( n/2 - j 

U4 
2 


V 


__ (n/ 2 -j)! _ 

j -p ) -j) (n /2 — p)\p\ (n/2 — j)\(j — p)\ ((n - w -p)/2)\((w +p)/2 - j)\ 


(n/ 2 )! (n/ 2 —p)! 

'2 — p)\p\ 

(n/ 2 )! 1 1 

p! (j~PV- ((n - w - p)/2)\((w + p)/2 - j)\ 

_ (n/ 2 )! _ ((w +p)/2)\ j ! 

((n — w — p)/2)\((w + p)/2)\ ((w + p)/2 — j)\j\ (j - p)\p\ 
n/ 


w-\-p 
2 

j J \p. 


V2 \(^\(j 

w-\-p 

2 


which concludes the proof. 


Let us recall now Proposition 3 that we want to prove. We recall first that we denote by H pk 
the random matrix chosen as the public parity-check matrix of our scheme. Let us recall that it is 
obtained as 


H pk 


SH sk P 


with 


a /H[/D 4 M — h^d 2 m\ 
sk l v H y D 3 M-HyDiMj 


where Di,--- ,D 4 are four diagonal matrices which verify (12) and M=(DiD 4 — D 3 D 2 ) -1 , S 
is chosen uniformly at random among the invertible ternary matrices of size (n — k ) x (n — k), 
H[/ is chosen uniformly at random among the ternary matrices of size (n /2 — kjj ) x n/ 2 , H k is 
chosen uniformly at random among the ternary matrices of size (n/2 — ky) x n/2 and P is chosen 
uniformly at random among the permutation matrices of size n x n. 
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Proposition 3. Let T >® be the distribution of the syndromes eH T when e is drawn uniformly 
at random among the ternary vectors of weight w and U be the uniform distribution over the 
syndrome space ¥f:~ k . We have 


E h 


pk 






with 


jn-k w 


t,/2 — 


+ §2i("i a )23»("/ a ) a 


E 




+ 3 

j n/l-nx-j 


n/2 — kjj 


E 


\ p =0 

\w-\-p=0 mod 2 

n,\ 2l (( n ~l Y)2“- 


w-\-p 

2 


— I \ 2 n/2 — nj 


o) 2V 


E 


n/2 — n j 


j =0 ^ V j 


— V V 


h =0 p =0 


/2 — nj\ ln/2 — ni — h\ ( n/2 — m — j 


h 


3 ~h 


2 m 

w — j — h — 2p 


~ ) w—j — h — 2p 


( 22 ) 


Proposition 3 is based on two lemmas. The first one is the following: 

Lemma 9. Let y be a non-zero vector 0 /F 3 and s an arbitrary element in F 3 . We choose a matrix 
H of size r x n uniformly at random among the set of r x n binary matrices. In this case 

P(yH T =s) = l 

Proof. The coefficient of H at row i and column j is denoted by hij , whereas the coefficients of y 
and s are denoted by yt and Sj respectively. The probability we are looking for is the probability 
to have 

E h ioVi = (50) 

j 

for all i in {1,... ,r}. Since y is non zero, it has at least one non-zero coordinate. Without loss 
of generality, we may assume that y\ = 1. We may rewrite (50) as hn = J2j> 1 hijUj- This event 
happens with probability | for a given i and with probability on all r events simultaneously 
due to the independence of the hij’ s. 

Let us now consider the following lemma which is a variation of the left over hash lemma: 

Lemma 1. Consider a finite family LL = (hi)i^i of functions from a finite set E to a finite set 
F. Denote by £ the bias of the collision probability, i.e. the quantity such that 

P h ,e,e'(h{e) =h{e')) = E(l + e) 

where h is drawn uniformly at random in Li, e and e! are drawn uniformly at random in E. Let 
U be the uniform distribution over F and V{h) be the distribution of the outputs h(e) when e is 
chosen uniformly at random in E. We have 

E h {p(V{h),U)} < \^fl. 

Proof. Let qhj be the probability distribution of the discrete random variable (ho, ho(e)) where ho 
is drawn uniformly at random in Li and e drawn uniformly at random in E (i.e. q^.f = Ph 0 ,e(ho = 
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h. ho(e) = /)). By definition of the statistical distance we have 


e h { P (v(h),u)} = E n 


h£H 


m 

i 


^ 2\n\ ^ 

h£U 1 1 f£F 




1 e 

(h,f)eHxF 

5 £ 


p hoA h 0 = h,ho(e) = /) - 


QhJ 


\n\-\F\ 


(hJ)eHxF 1 

Using the Cauchy-Schwarz inequality, we obtain 


\n\-\F\ 


( 51 ) 


E 

(hJ)eHxF 


1 


QhJ 


\n\-\F\ 


< 

- \ 

£ (■ 

\ 

(hj)enxF x 


QhJ 


\n\-\F\ 


Vm-\F\. (52) 


Let us observe now that 

E 

(hj)enxF 


QhJ 


\n\-\F\ 


~ E ( Q hJ ~ 2 

hj 


QhJ 


~ E q hJ - 2 

hj 

= E</- 

hj 


\n\-\F\ \H \ 2 ■ |F | 2 

QhJ 1 

^1-1^1 |H|-|F| 

1 




(53) 


Consider for i £ {0,1} independent random variables hi and that are drawn uniformly at 
random in T-L and E respectively. We continue this computation by noticing now that 

E«lf = E^Veo^o = h,h 0 {e 0 ) = f)W huei {hi = h,h ii(ei) = /) 

hj hj 

— ,eo,ei (h 0 hr, ho(eo) — hi(ei)) 

_ P/io, e o,ei (h-o(eo) “ ho(ei)) 


|H| 


1 + e 

1^1-IP 1 !' 


(54) 


By substituting for ^ ^ the expression obtained in (54) into (53) and then back into (52) we 

finally obtain 


E 

(hj)enxF 


QhJ 


1 

m-\F\ 


< 


I l + e 1 

\n\-\F\ - \n\-\F\ 


VW^Fl = = V~e. 


This finishes the proof of our lemma. □ 


In order to use this lemma to bound the statistical distance we are interested in, we perform now 
the following computation 

Lemma 10. Assume that x and y are random vectors of S w that are drawn uniformly at random 
in this set. We have 

PH pk ,x, y (xHp k = yHp k ) < — ^(1 + e) with e given in Proposition 3. 
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Proof. Recall that H pk is obtained as 


H pk = SH sk P with H sk = 


fHuT> 4 M —H tf D 2 M\ 
—HyDiM J 


where Di, • • • , D 4 are four diagonal matrices which verify (12) in Definition 2 in §4.2 and M =(DiD 4 
d 3 d 2 )-\ s is chosen uniformly at random among F 3 n k ) x ( n k H[/ j s chosen uniformly at ran¬ 
dom among p( n / 2_fe E/)xn/ 2 ^ j s c hosen uniformly at random among F 3 "^ 2 fe v)xn /2 anc j p j s 
chosen uniformly at random among the permutation matrices of size n x n. As S is non-singular 
and P is a permutation, the probability of the event xHj k = yHp k is the same as the probability 
of the event 


/H[/D 4 M —H[/D 2 M\ t _ /H[/D 4 M -Hc/D 2 M\ t 
(H yD 3 M —HyDiM J X " ^HyD 3 M —HyDiM J y ■ 


Let x be a vector of F 3 , we will denote in the following by x 3 (resp. x 2 ) the vector formed by its 
first (resp. last) n/2 coordinates. In other words, the probability we are looking for is 


P( ((xi - yi)D 4 M - (x 2 - y 2 )D 2 M) Hj, = 0 , ((x x - yi)D 3 M - (x 2 - y 2 )DiM) Hy = 0 ). 

where the probability is taken over Hy,Hy,x,y. To compute the previous probability we use 
Lemma 9 which says that: 

Ph (eH T = 0) = n _ k if e ^ 0 and 1 otherwise (55) 

3” 

when H is chosen uniformly at random in F 3 ” fc ' x ". This lemma motivates to distinguish between 
four disjoint events 

Event 1: 

£1 ={(xi - yi)D 4 M = (x 2 - y 2 )D 2 M, (x 3 - yi)D 3 M ^ (x 2 - y 2 )DiM} 

Event 2: 

£2 ={(xi - yi)D 4 M ^ (x 2 - y 2 )D 2 M, (x k - yi)D 3 M = (x 2 - y 2 )DiM} 

Event 3: 

£3 ={(xi - yi)D 4 M ^ (x 2 - y 2 )D 2 M, (xi - yi)D 3 M ^ (x 2 - y 2 )DiM} 

Event 4: 

£4 ={(xi - yi)D 4 M = (x 2 - y 2 )D 2 M, (x x - yi)D 3 M = (x 2 - y 2 )DiM} 

Under these events we get thanks to (55) and k = ku + ky- 
PH Bk ,x,y ( x H sk = yH sk ) 

4 

= £P Hak (xH s T k = yHl k \£ t ) P x , y {£,) 

i =1 

_Px, y (^l) Px,y(f 2 ) . Px,y(f 3 ) . p (c , 

~ ^n/2-k v 3 n/ 2 -ky 3 n-k + r x,y W4J 

_ _J_ ( P ( g l) , P ( g 2 ) p [f x -n-fcp (ff A 

3 n-fc \o„/ 2 -k v -n+k + %n/2-k v -n+k + r ( C 3j + & r W4j I 

< (l + 3 n/2 ~ ku P (Fi) + 3 n / 2 “ fc vp (£ 2 ) + 3 "-*p(£ 4 )) , 


(56) 
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where we used for the last inequality the trivial upper-bound P (£ 3 ) < 1. Let us now upper-bound 
(or compute) the probabilities of the events £\. £2 and £ 4 . For £ 4 , recall that from the definition 

of admissible generalized (U, U + fo)-codes and especially the fact that D = and M are 


invertible we clearly have 
which easily gives 


D 3 D 4 


£4 = {xi = yi,x 2 = y 2 } 
(£4) = P(x = y) = 1 


(57) 


“ x,y \^<kj — ^ — J J — (n) ■ 

Let us now estimate to probability of £2 for which derive the following upper-bound: 

P (£ 2 ) < P ((xi - yi)D 3 M = (x 2 - y 2 )DiM) 

But now from the condition (12): 

Vie [1,n/2], D 3 (*,*)D 1 (i,t ) ? 6 0 

of the definition of admissible generalized (U,U + fo)-codes and the fact that M is an invertible 
diagonal matrix we have: 

(xi - yi)D 3 M = (x 2 - y 2 )DiM 4=4- Vi G [1, n/2], (xi - yi)(i) = ±(x 2 - y 2 )(i) 

Let us notice that distribution of a vector x = (xi,x 2 ) uniformly picked at random in S w is the 
same as by multiplying some of its components to —1 . In this way we have 

P (£ 2 ) < P (xi - yi = x 2 - y 2 ) 

To upper-bound P(x 3 — x 2 = yi — y 2 ), let us derive the distribution of xj — x 2 . We first observe 
that 


P(x! - x 2 = e) = P^X! - x 2 = e 


xi - x 2 =w ( 


< 


2'We ( n / 2 ) 2 V 


E 


)p(|x! - X 2 | = w e ) 
w e \fn/2\f^±2\^ 


p—0 

w+p= 0 mod 2 


w-\-p 

2 


2 '2 2 " (see Proposition 11) 

w e ' 


(58) 


From this we deduce that 

n/2 

P(xi - yi = Xi -y 2 ) = ^2 E Px(xi-X 2 = e ) 2 

i=° e eFj / 2 :|e|=j 


n/2 

<E* 

3=0 


n /2 

3 


i 


2*1 ( n / 2 ) 2 V 


E 


p—0 

w+p =0 mod 2 




pj \ 3 


2 — 


(by Eq. (58)) 


n/2 


= E 


i=o 2 j (”{ 2 ) 2 2w (™)' 


E 


1 p=0 

\w+p=0 mod 2 




w+p 

2 


3 J 


J 


Therefore we deduce that 


1 


n/2 

’(£ 2 ) < — 

h 2 2 rf )2 > w {iy 


E 


1 p =0 

\w+p=0 mod 2 


w+p 

2 


j\(n/2\f^ 


J 


\ 


2~ 


To upper bound £-\ let us first recall the following definition 


(59) 
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Definition 3. (number of V blocks of type I). In a generalized (U,U + V) code of length 
n associated to the A-tuple of diagonal matrices of size n/2 (Di, D 2 , D 3 , D 4 ), the number of V 
blocks of type I, which we denote by m, is defined by: 

ni = | {1 < i < n/2 : D 2 (i, i)D 4 (i, i ) = 0} | . 

In other words from the fact that M is invertible, the event = {(x/ — yi)D 4 M = (x 2 — 
y 2 )D 2 M, (xi — yi)D 3 M ^ (x 2 — y 2 )DiM} is the same (up to a permutation of indices of x 
and y) as: 

Vie[l,nj], (xi-yi)(i) = 0 or (x 2 -y 2 )(i) = 0 ; Vi G [n/+l,n/2], (xi-yi)(i) = i(x 2 -y 2 )(i) 

Now by using the fact that distribution of a vector x = (xi,x 2 ) which is uniformly picked at 
random in S w is the same as by multiplying some of its components by —1 or exchanging some of 
its component we have 

P(£i) < P(Vi G [l,n/], (x 4 - yi)(i) = 0, Vi G [ri/ + 1,n/2], (x 4 - y 4 )(i) = (x 2 - y 2 )(i)) 

Let us now derive the following upper bound 


P(Vi G [l,n/], (xi-yi)(i) = 0, Vi G [ti/ + l,n/2], (x 4 -yr)(i) = (x 2 -y 2 )(i)) 

nj 

= ^p((x 1 -y i ) I i ! „ I j =0, (xi-X 2 ) [nj+lin/2] = (yi -y2)[„ J+ i i „ /2 ] 11 (Xl )[!,„,]] I = l, |(yi)[l,n/]| = l ) 

1=0 

(xi)[l,n,]| = M(yi)[l,n,l = 0 

(7)2‘( n J-7)2"-‘V 


= E 


1=0 


C)2- 


((*1 -yi)|l,„,| = 0 . (*1 - *2) In , + i,n/21 = (yi - y2)[„ I + i,„/2] |l(*l)|l,,i,]l = 1 . l(yi)[l,o,]l = 1 ) 

n-ri ! / ( n i\nl ( n ~ n i)2 w ~ l \ 2 _ ; 

Y ( ' fZ{~ l w -) Y p((xi)[i,nr] =er, (xi - x 2 )[ n/+ i, n/2 ] =e 2 ||(x 1 )[ 1>njJ | =/) 


1=0 


= Y 


1=0 


{ CP W 
■ ( T ) 2 , (r _ 7 ) 2 , 


eieF^,e 2 eF " /2 ni 


• n/ 2 —ni 

Y Y Y 

eiSF 3 7 :|ei|=i j —0 e 2 £¥™ /2 " J :|e 2 | =j 


x i)[i, n/ ]| = ei, (xi - x 2 )j n/+1)n/2 j = e 2 ||(xi)[ 1>n/ ]| = l, |(xi - x 2 )[„ J+1>n/2 j | = j) 


P(|(xi -X 2 )j n/+W2 ]| =j)Y 


= Y 

1=0 


ni 

= Y 

1=0 


(7)2'(r-7)2“ 

C)2” 

1 

2i ( n /)2i ( n / 2 i~ ni ) 

/ \ ( (n—ni\nw—l \ 2 n/2—nj 

ni Y M ( “-v ) Y 


2 n/2—n/ 

Y Y Y 

e i6F" 7 :|ei |=7 3-0 e2 gr^ 2 n l : | e2 | = j 

X 1 — x 2 )[nj + l,n/2] I = j) 



1 


C) 2 “ 


/ ^ r ) j(n/ 2 —n I \ 

7=0 Z \ j > 


(P(|(xi -X 2 )j n/ + 1)n/2 ]| =j)Y 



2 2 U-h) 


n/2 - 71/ - j 
P 


2 P ( 2 n / | 2 v)-j- p -2h 

w — j — h — 2 pj 
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which gives 


ni / \ / (n-ni\ryw-l\ 2 n/2—m 

■<*><e™MI-] E 1 


O* 


1=0 
j n/2-rn-j 

E E 

h=0 p=0 


i ) \ 0 2w ) u v( n ' 2 ~ ni ) 


n /2 — m\ f n /2 — ni — h\ (n/2 — ni — j 

3 ~h 


2 nj 

w — j — h — 2p 


qw— j — h—2p 


Therefore, with Equations (56),(57),(60) and (59) we finally obtain 


^Hpt^x.y ( x H p k — yHpk ) < _ k (1 + s) 


with 


£ = 


DTl—k 


E 


q n/2—ky 


2w O £z&( n j a )2 2w ( n l 2 Y 


' ± exsxT)- ' 


_|_ ^n/2—kjj 

j n/2—ni—j 


1=0 


1 p=0 

\w-\-p=0 mod 2 

■»a 2 « 

l ) V O 2 ™ 


J 


E cyj(n/2—ni\ 
3=0 Z l j I 


C)2' 


E E 

/i —0 p —0 


n /2 — n/\ /n /2 — n/ — /i\ /n /2 — rip — j\ ( 2 n/ 

h J V 3 ~h /V P / V w - 3 - h - 2 P. 


(60) 


<2W—j—h—2p 


(61) 


which concludes the proof. 

Lemmas 10 and 1 imply directly Proposition 3. 

Proof (Proposition 3). Indeed we let in Lemma 1, E = Fg, F = Fg _fe and % be the set of functions 
associated to the 4-tuples (Hy, Hy, S, P) used to generate a public parity-check matrix H pk . These 
functions h are given by h(e) = eHj k . Lemma 10 gives an upper-bound for the £ term in Lemma 
1 and this finishes the proof of Proposition 3. 

We are now able to prove Lemma 4 (we use here notations of the security proof in §6.3). 

Lemma 4. 

P(Si) < P(S 2 ) + ^ sh y/e where e is given in Proposition 3. 

Proof (Lemma f). To simplify notation we let q = (/hash ■ Then we notice that 

P(Si) < P(5 2 ) + V puh 0 W®«), (62) 


where 

— U is the uniform distribution over 

— ©P/q is the distribution of the (q + l)-tuples (H pk , eiHp k , * • • , e 9 Hj k ) where the e,’s are 
independent and uniformly distributed in S w ; 

— Ppub 0 is the distribution of the (q + l)-tuples (H pk , Si, • • • , s q ) where the s./s are inde¬ 
pendent and uniformly distributed in F 2 _fe . 
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We now observe that 

P (V^,V puh = X] P ( R pk = H 

HgF ('*-'= )x " 

<q Y, P(H pk = H )p{T%,U) (by Prop-4) 

HeF(" _fc)x " 

= # Hpk { P {v^ h M)} 

<q^ (by Prop. 3). 

C Proof of Proposition 7 

Let us recall Proposition 7 

Proposition 7. Assume that we choose an admissible generalized (U, U + V) code over F 3 with 
a number ni of linear combinations of type I by picking the parity-check matrices of U and V 
uniformly at random among the ternary matrices of size (n /2 — kjj) x n/2 and (n /2 — ky) x n /2 
respectively. Let a/ uv )(w), a( u ,o)(w) and a^o v )(w) be the expected number of codewords of weight 
w that are respectively in the admissible generalized (U, U + V) code, of the form (uDp uD 3 ) where 
u belongs to U and of the form (vD 2 ,vD 4 ) where v belongs to V. These numbers are given for 
even w in {0 ,..., n} by 

1 V' f ( n / 2 “ ni \ 2 (w+])/2 

^n/2-k v *=i ) 

j even 


.,0 )M = 




w/2 


3«/2-fcu 


a (0.v) ( w ) = 


a(u, v )M = a (u>0) (w) + a(o,v)(w)+ 



and for odd w £ { 0 , • • • , n} by 


a( u,o)(w) = 0 


«(0,v)M = 


1 


Qn/2—ky 


E 

J=0 
j odd 


n/2 — nj 

W-j 

2 


2 ( W +J)/ 2 


a (u,v)(w) 


«(0,v)M 


1 

Qn—ku — kv 


f 

V 



j odd 


\ 

2 (w+j)/2 


On the other hand, when we choose a linear code of length n over F 3 with a random parity-check 
matrix of size (n — kjj — ky) x n chosen uniformly at random, then the expected number a(w) of 
codewords of weight w > 0 is given by 


a(w) 


C) 2 W 

Qn—ku—kv 


Lemma 9 in Appendix §B will be useful four the proof. The last part of Proposition 7 is a 
direct application of this lemma. We namely have 
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Proposition 12 . Let a(w) be the expected number of codewords of weight w in a ternary linear 
code C of length n whose parity-check matrix is chosen H uniformly at random among all binary 
matrices of size r x n. We have 



Proof. Let Z = ]P xgF „.| x | =w Z* where Z x is the indicator function of the event “x is in C”. We 
have 


a(w ) = E (Z) 

= E E (^) 

x£Fg :|x|=u> 

= E p ( x e c ) 

xGFg :|x| =w 


= E P(xH t = 0 ) 

xGFg :|x|—tu 



xGFg :|x|=ip 

n 
w 

3 r ' 


This proves the part of Proposition 7 dealing with the expected weight distribution of a random 
linear code. We are ready now to prove Proposition 7 concerning the expected weight distribution 
of a random ( U, U + V) code. 

Weight distributions of (Z 7 Di, [/D3) ={(uDi, UD3) : u £ U} and (D 2 V, D4P) ={(vD 2 , VD4) : 
vef}. Let us recall the (Did + D 2 P, D3t/ + D4) is an admissible generalized code which enforces 
that 

V* € [l,n/2], Di(i,i)D 3 (i,i) ^ 0 

and therefore it follows directly from Proposition 12 since a^ u .o)(w) = 0 for odd and a( U .o)(w) 
is equal to the expected number of codewords of weight w/2 in a random linear code of length 
n/2 with a parity-check matrix of size (n/2 — kjj ) x n/2 when w is even. On the other hand, the 
weight distribution of (D 2 v, D4V) for v £ V is little more sophisticate. Let us recall the following 
definition: 


Definition 3. (number of V blocks of type I). In a generalized (U,U + V) code of length 
n associated to the d-tuple of diagonal matrices of size n/2 (Di, D 2 , D 3 , D 4 ), the number of V 
blocks of type /, which we denote by nj, is defined by: 


ni = |{1 < i < n/2 : D 2 («, i)T> 4 (i, i ) = 0}| . 


where from the definition of generalized (U,U + V ) when either D 2 (i, *) = 0 or D 4 (i, *) = 0, 
the other one is necessarily different from 0. In this way, a(o,v)(w) is equal to the expected number 
of weight j + for all j in [l,nj] in a random linear code of length n/2 where j positions 
correspond to the n/ positions which gives the number of block of type / and Wr- for the others 
as there are involved in components which count twice in the weight. Furthermore this code has 
a parity-check matrix of size (n/2 — ky) x n/2. This easily gives from Proposition 12: 


a (0,v) 


3 n/2- 


^£(7)CE n ') 2 ^ 


Weight distributions of (t/Di + PD 2 , UG 3 + PD4). The admissible generalized (U, U + V)- 
code is chosen randomly by picking up a parity-check matrix Hy of U uniformly at random among 
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the set of (n/2 — ku) x n/2 ternary matrices and a parity-check matrix Hy of V uniformly at 
random among the set of (n/2 — ky) x n/2 ternary matrices. Let Z = J/ x eFj-|x|=w where Z x is 
the indicator function of the event “x is in ([/Dx + FD 2 , E/D 3 + VD 4 )”. 

We have 


a(u, v )M = E(Z) 

= E E ( z *) 

xGFg:|x|=i/7 

= E p ( z * = !) 

xGFg:|x|=tt; 

= E P(xe (1 /Di + VD 2 ,UD 3 + VD 4 )) (63) 

xGFg :|x|=lo 

Let us recall now that a parity-check matrix of the code (ITDi + VT> 2 , U D 3 + VD 4 ) is: 

(HyD 3 M -HyDjM J 


r a . 


where M=(DiD 4 — D 3 D 2 ) 1 is a diagonal invertible matrix. Therefore, by writing x = (xi,x 2 ) 
where x 2 is in F 3 ^“ we know that x is in (DiU + D 2 , D 3 f7 + D 4 V) if and only if at the same time: 

xiD 4 Hj/ T = x 2 D 2 H u T , xiD 3 Hy T = x 2 DiHy T . 


There are three disjoint cases (see D = 
Case 1: XiD 4 = x 2 D 2 . In this case 


Di D; 
D 3 D 


is invertible) to consider 


P(x G (U Di + vr> 2 ,ur> 3 + VD 4 )) = P((xxD 3 - x 2 D!)H y T = 0 ) = 


1 


3 n/2—ky 


(64) 


Case 2: xiD 4 Hj/ T = x 2 D 2 H[/ T . In this case 

p(x g (CDi + vn 2 , un 3 + v] 

Case 3: xxD 4 7 ^ x 2 D 2 and x 1 D 4 Hj/ T 7 ^ x 2 D 2 H[/ T . In this case 


P(x G (UT>i + vr> 2 , U D 3 + VD 4 )) = P(( Xl D 4 - x 2 D 2 )H[f T = 0) = (65) 


P(x G (U Di + CD 2 , U D 3 + FD 4 )) = 

p ((x 1 D 3 - x 2 D 1 )Hy T = 0, ( Xl D 4 - x 2 D 2 )H u 1 = 0) = 3n/2 _ fc „ ^ 2 _ kv (66) 

Note that we used in each case Lemma 9. 

By substituting P(x G (C/Di + VD 2 , C/D 3 + VD 4 )) in (63) and using definition of number of 
blocks of type / (Definition 3) we obtain for even 0 < w < n 


a(u,v)M = a (u .v)M = a (u, 0 ) ( w ) + «(o.v)M + 

. t 


Qn—ku—ky 




2 W - 


n/2 

w/2 


2 w/2 _ ^ 


3=0 
j even 


nj\ f n/2 — m 

j 


W-J 

2 


and for odd w < n 


«(u,v)(«>) = a( o,v)M + 


Qn—ku — ky 


2’ 1 ' - E 


3=0 
j odd 


ni\ ( n /2 — ni 

j 


W-J 

2 


\ 

2 {w+j)/2 

J 

\ 

2 ( w + j )/ 2 


which concludes the proof. 



